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Abstract 

The main goal of this paper is to give a pedagogical introduction to Quantum Infor- 
mation Theory — to do this in a new way, using network diagrams called Quantum 
Bayesian Nets. A lesser goal of the paper is to propose a few new ideas, such as 
associating with each quantum Bayesian net a very useful density matrix that we call 
the meta density matrix. 
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1 Introduction 



The main goal of this paper is to give a pedagogical introduction to Quantum Infor- 
mation Theory — to do this in a new way, using network diagrams called Quantum 
Bayesian (QB) Nets. The paper assumes no prior knowledge of Classical[l]-[2] or 
Quantum[3]-[9] Information Theory. It does assume a good understanding of the ma- 
chinery of Quantum Mechanics, such as one would obtain by reading any reasonable 
textbook that explains Dirac bra-ket formalism. The paper reviews QB nets in an 
appendix. If you have difficulty understanding said appendix, you might want to read 
Ref. [10] before continuing this paper. 

Most of the ideas discussed in this paper are not new. They are well-known, 
standard ideas invented by the pioneers (Bennett, Holevo, Peres, Schumacher, Woot- 
ters, etc.) of the field of Quantum Information Theory. What is new about this paper 
is that, whenever possible and advantageous, we rephrase those ideas in the visual 
language of QB nets. The paper does present a few new ideas, such as associating 
with each QB net a very useful density matrix that we call the meta density matrix 
of the net. 

The topics covered in this paper are shown in the Table of Contents. The pa- 
per, in its present form, is far from being a complete account of the field of Quantum 
Information Theory. Some important topics that were left out (because the author 
didn't have enough time to write them up) are: quantum compression, quantum error 
correction, channel capacities, quantum approximate cloning, entanglement quantifi- 
cation and manipulation. Future editions of this paper may include some of these 
topics. I welcome any suggestions or comments. To fill in gaps left by this paper, 
or to find alternative explanations of difficult topics, see Refs.[3]-[9] and references 
therein. 
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2 Notation 



In this section, we will introduce certain notation which is used throughout the paper. 

We define Z^f, to be the set {a, a + 1, ■ ■ ■ , 6} for any integers a and h. Let 
Bool = {0, 1}. For any finite set S, let \S\ denote the number of elements in S. 

The Kronecker delta function 6{x, y) equals one ii x = y and zero otherwise. 
We will often abbreviate S{x, y) by 5y. 

We will often use the symbol to mean that one must sum whatever is on the 
right-hand side of this symbol over all repeated indices (a sort of Einstein summation 
convention). Likewise, Y^aii '^11 mean that one should sum over all indices. If we 
wish to exclude a particular index from the summation, we will indicate this by a 
slash followed by the name of the index. For example, in 1]^// o^' J2aii/f ^6 wish to 
exclude summation over /. 

The Pauli matrices a^, Cy and are defined by 

= ( ? J ) > = ( ■ 7 ) ' = ( J -1 ) ■ ^2.1) 
For any real p G [0, 1], we define the binary entropy function h{-) by 

Hp) = -plog2(p) - (1 -p)log2(l -p) . (2.2) 

When speaking of bits with states and 1, we will often use an overbar to 
represent the opposite state: = 1, 1 = 0. 

We will underline random variables. For example, we might write P{a = a) 
for the probability that the random variable a assumes value a. P{a = a) will 
often be abbreviated by P{a) when no confusion will arise. Sa will denote the set of 
values which the random variable a may assume, and A^^ will denote the number of 
elements in Sg,- With each random variable a, we will associate an orthonormal basis 
{|a)|a G Sa} which we will call the a basis. We will represent by Tia the Hilbert space 
spanned by the a basis. \a = a) will mean the same thing as |a); |a = a) is just a 
more explicit notation that indicates that \a) belongs to 7ia. If Xi,X2, ■ ■ ■ ,Xjs[ are any 
random variables, we will use Ti^ ,x ,-,x to denote Tix ® "Hx ® ■ ■ ■ "Hx ■ 

Whenever we use the word "ditto", as in "X (ditto, Y)", we mean that the 
statement is true if X is replaced by Y. For example, if we say "A (ditto, X) is smaller 
than B (ditto, Y)", we mean "A is smaller than B" and "X is smaller than Y". 

This paper will also utilize certain notation associated with classical and quan- 
tum Bayesian nets. See Appendix A for a review of such notation. 
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3 Classical Entropy: Its Definition and Properties 



In this section, we will define various classical entropies associated with a CB net. 

Suppose Pi,P2, ■ ■ ■ ,Pn are non-negative numbers which add up to one. The 
classical entropy H{p) of p is defined by 

N 

H{p) = -J2Pi^^S2Pi ■ (3-1) 

i=l 

H{p) measures the spread of the probability distribution p. 

In Thermodynamics, entropy measures the disorder of a macroscopic system. 
See Ref. [5] for a discussion of the relationship between the entropy of Thermodynamics 
and Eq.(3.1). 

In Communication Theory, one uses the words "information" and "entropy" 
interchangeably. In the context of communication theory, the word "information" 
means information content of an average message. Given any random variable x, 
one may think of a sequence Xi, ^2, . . . , x^v of samples of x as a message. Then one 
makes the assumption that the more information an average message (of fixed length) 
carries, the higher the variance of x will be, and vice versa. Eq.(3.1) quantifies the 
variance of x if we replace pi and the sum over 1 < i < N hy P{x = x) and a sum 
over X E S, where S is the set of values that x can assume. 

When dealing with a CB net, it is convenient to rephrase Eq.(3.1) in terms of 
the node random variables of the net. Consider a CB net Af" with nodes labelled by 
the random variables Xi,X2, . . . , Xjy. These random variables are related by a joint 
probability distribution P(x.). Suppose Fi and T2 are non-empty subsets of Zi^n- 
Fi and F2 need not be disjoint. The probability distributions P[(x.)rJ, P[(x.)r2] 
and P[(x.)riur2] can be obtained by summing P{x.) over the unwanted arguments, a 
process called marginalization. We define: 

H[{x.)r,] = - E ^[(^OrJ log2 P[(x.)rJ , (3.2) 

(a;-)ri 



H[{x.)rA{x.)r,] = - E ^[(^Or.ur^] log^ ( ^ 'ZZT\'' ) , (3-3) 

(a).)rjur2 



: teOrJ = E ^•|(-)r.ur,l log, ( p^^^^^^J ■ (3-4) 

For example, if a and b are nodes of a CB net, then 

i^(a) = -E^(«)log2^(«), (3.5) 




6 



H{a, 6) = - E P(a, b) log, P{a, h) , (3.6) 

a,b 

H{a\b) = -Yl P{a, h) log2 P{a\b) , (3.7) 

a,b 

H{a :b)=j: P(a, b) log, [-^^) , (3-8) 

where P{a) = J2bP{(^^b), P{b) = J2aP{(^^b)y the sums over a (ditto, b) range 
over all a E Sa (ditto, b G Sb). 

Note that definitions Eqs.(3.2) to (3.4) are independent of the order of the 
node random variables within (x.)ri and (x.)r2- For example, if a, 6, c are nodes of a 
CB net, then 

H{a,b,c) = H{a,c,b), H[a\{b,c)] = H[a\{c,b)] . (3.9) 

It is convenient to extend definitions Eqs.(3.2) to (3.4) in the following two ways. 
First, we will allow {x-)ri (ditto, {x-)r2) contain repeated random variables. If it 
does, then we will throw out any extra copies of a random variable. For example, if 
a,b,c are nodes of a CB net, then 

H{a,a,b,c) = H{a,b,c), H[a\{b,b,c)] = H[a\{b,c)] . (3.10) 

Second, we will allow (x.)ri (ditto, (a:.)r2) to contain internal parentheses. If it does, 
then we will ignore the internal parentheses. For example, if a,b,c,d are nodes of a 
CB net, then 

H[{a,b),c] = H{a,b,c), H[a\{{b,c),d)] = H[a\{b,c,d)] . (3.11) 

Let X = {x.)ri and F = {x.)r2- H(X) measures the spread of the P{X) 
distribution. H{2(,\Y_) is called the conditional entropy of X given y. H{2(_ : Y_) is 
called the mutual entropy of X and H, and it measures the dependency of X and Y_: 
it is non- negative, and it equals zero iff X, F are independent random variables (i.e., 
P(X, Y) = P{X)P{Y) for all X e S'x and F e Sy). 

Note that Eqs.(3.2) to (3.4) imply that 

H{X\Y) = H{X,Y) - H{Y) , (3.12) 



H{X-Y) = H{X) + H{Y)-H{X,Y), (3.13) 
HiX : Y) = H{X) - H{X\Y) , (3.14) 
H{X : Y) = H{Y) - H{Y\X) . (3.15) 
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In Eq.(3.14), one may think of H{X) as the information about X_ prior to transmitting 
it, and H{X\Y} as the information about X once X_ is transmitted and Y_ is found 
out. Since H{X_ : F) is the difference between the two, one may think of it as the 
information (or entropy) "transmitted" from X_ to Y_. This interpretation of H{X_ : F ) 
is an alternative to the dependency interpretation mentioned above. 

Let X_ = (x.)ri, H = {x-)r2 Z_ = (x.)r3, where the ri,r2,r3 are non- 
empty, possibly overlapping, subsets of Zi^j^. We can extend further the domain of 
the function H{-) by introducing the following axioms 

H[{X,Y) : Z] = H[{X : Z), {Y : Z)] , (3.16) 

H[{X : Y),Z] = H[{X,Z) : {Y,Z)] . (3.17) 

Eq.(3.16) means that ":" distributes over According to Eq.(3.13), the LEFT 
hand side of Eq.(3.16) equals H{X,Y) + H{Z) - H{X,Y,Z). Eq.(3.17) means that 
"," distributes over According to Eq.(3.13), the RIGHT hand side of Eq.(3.17) 
equals H{X,Z) + H{Y,Z) - H{X,Y,Z). With the help of the above distributive 
laws, the entropy of a compound expression with any number of ":" and "|" operators 
can be expressed as a sum of (±l)if(-) functions containing "," but not containing 
":" and "|" in their arguments. For example, if a,b,c are nodes of a QB net, then 

H[{a : b)\c] = H[{a : b),c] - H{c) 
= H[{a,c) : {b,c)]- hIq) . (3.18) 

= Hia, c) + H{b, c) - Hia, b, c) - H{c) 

If some parentheses are omitted within the argument of H{-), the argument 
may become ambiguous. For example, does H{a : b,c) mean H{{a : b),c) or H{a : 
{b,c))'! Ambiguous arguments should be interpreted using the following operator 
precedence order, from highest to lowest precedence: comma(,), colon(:), vertical 
line(l). Thus, H{a : 6, c) should be interpreted as H{a : (&, c)). 

In the mathematical field called Set Theory, one defines the union AU B, the 
intersection A (1 B and the difference A — B = A (1 complement (S) of two sets A 
and B. One also defines functions /i(-) called measures. A measure /i(-) assigns a 
non-negative real number to any "measurable" set A. fi{-) satisfies 

/i(0) = , (3.19) 

oo 

^^{UZoE^)=Y.KE^) , (3.20) 

i=0 

where is the empty set, and the -Ej's are disjoint measurable sets. For example, 
for any set 5* = [ai, bi] U [02, 62] U . . . U [a^r, 6jv], where the [a^, 6j]'s are disjoint closed 
intervals of real numbers, one can define /i(5') = J2iLoi^i ~ '^i)- 
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There is a close analogy between the properties of entropy functions in Infor- 
mation Theory(IT) and those of measure functions in Set Theory (ST). If A,B are 
sets and a,b are node random variables, then it is fruitful to imagine the following 
correspondences [1 1] : 



In both ST and IT, one defines a real-valued function (i.e., /i(-) in ST versus H{-) 
in IT). This real- valued function takes as arguments certain well-formed expressions. 
A well-formed expression consists of either a single atom (a set in ST versus a node 
random variable in IT) or a compound expression. A compound expression is formed 
by using binary operators ( U fl — in ST versus , : | in IT) to bind together either 
(1) 2 atoms or (2) an atom and another compound expression or (3) two compound 
expressions. 

Table 3 gives a list of properties (identities and inequalities) satisfied by the 
classical entropy H{-). Whenever possible. Table 3 matches each property of entropy 
functions with an analogous property of measure functions. See Refs.[l]-[9] to get 
proofs of those statements in Table 3 that are not proven in this paper. 



atoms : 

binary operators : 



A < — > a 



real — valued function 



AUB < > (a, b) 

AnB < > (a : b) 

A-B < — > la\b) 

fi{A) < — > H{a) 



(3.21) 
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Table 1. ENTROPY PROPERTIES (compiled by R.R.Tucci, report errors to tucci@ar-tiste.com) 



fj,{A - B) = ijl{A U B) - ^l{B) 
(— in terms of U) 


H{X\Y) = H{X,Y) - H{Y) 


H^Sp 


At{A n B) = ti{A) + fi{B) - ^i{A U B) 
(n in terms of U) 


H{X ■■ Y.) = H{X) + H(Y_) - H(X,Y) 


H Sp 


u B) n C) = fi{(A n C) u (B n C)) 

(n distributes over U) 


H{{X,Y):Z) = H{(X:Z),{Y:Z)) 


H^Sp 


fj.({A n B) u c) = fi[(A u c) n (B u C)) 

(U distributes over n) 


H{{X : Y),Z) = H({X,Z) : {Y,Z)) 


H-^Sp 


< fj.{A) 
(non-negative) 


0<H{2C)< log2 Nx 

H{X) = iff P{Xo) = 1 for some Xq £ Sx, 
and P{X) = for all other X £ Sx- 
H{X) = log2 Nx iff PiX) = -jJ- for all X. 


H^Sp 

Let p' = trp, where trace is over 
all random variables except X. 
Sp{)C) = iff p' is pure. 

Sp(X) = log2 Nxi^p' = I^ 


fi{B) < fj.(A U B) 
or < At(A - B) 


H{Y) < H{X,Y) 
or < H{X\Y) 
Equality iff X = f{Y_) 
for some function /(■)• 


\Sp(X)-Sp{Y)\ < Sp{X,Y) 
Triangle Inequality (Araki-Lieb). 
Sp{X\V) may be negative! 
Let p' = trp, where trace is over 
all random variables except X , Y_. 
Equality iff p' is pure. Schmidt 
Decomp. applies when p' is pure. 


^(AUB) < Aj(A) -1- Aj(B) 
or At(A - B) < At(A) 
or < At(An B). 
(sub-additivity) 


H{X,Y) < H{X) + H{Y) 
or H{X\Y) < H{X) 
or < H{X : Y). 

Equality iff X_ and Y_ are independent. 


H~*Sp 

Let p' = trp, where trace is over 
all random variables except X_, Y_. 
Equality iff p' = {tr xp'){trY p') 


At(A- (BUC)) < /i(A- B) 
(strong sub-additivity) 


H{X\{Y,Z)) < H{X\Y) 


H^Sp 
(Lieb-Ruskai) 






SiUpUf) = S[p), 
for any unitary matrix U. 
Thus, if p has eigenvalues pj, 
then S{p) = H{p). 






S{p) < H{p), 

where pi =< i\p\i >. Equality iff 
< Mp\j > = for all i ^ j. 






where pj is a prob. distribution. 
Equality iff < j\j' >= 5{j,f). 




where pj and qj are prob. distributions. 

Gibbs' inequality. 

Equality iff gj = Pj for all j. 


"tr(p(log2 p - log2 (t)) < 
where p, a are density matrices. 

Equality iff p = ct. 




)Jq WaH{pa) < WcPc), 

where Wa > and Wa = 1. 

— ^ — 

Convexity. 

Equality iff 3p such that \ia,pa: = p. 


)J^-U;„5(pa) < Si^^WaPa), 

where Wa > and Wa = 1. 

— z — / a 

Convexity. 

Equality iff 3p such that Vo, pa = P- 




- Zla '^aH{pa) - Wa logj Wa, 

where Wa > and Wa = 1- 

— ^ — ^a 

Equality iff ■ P/3 = for a ^ f3. 

Equality is Shannon grouping axiom for -ff(-). 


^"aPa) 

where Wa > and Wa = 1. 

— ^ — ^ a 

Equality iff papp = for a (3. 
Lanford- Robinson. 
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4 CB Net Examples 



In Section 3, we discussed entropic properties which are vahd for all CB nets. In this 
section, we will discuss entropic properties that apply to particular CB nets. 

First, we will consider all possible CB nets with 2 and 3 nodes. Their nodes 
will be labelled by the random variables a, b, c. 



a b 



Figure 4.1: Two connected nodes. 



Fig. (4.1) shows two connected nodes. By the definition of CB nets, the joint 
probability P{a, h) of the two nodes of this net satisfies: 

P{a,b) = P{b\a)P{a) . (4.1) 

Taking the logarithms and then the expected values of both sides of the last equation 
yields 

H{a,h) = H{b\a) + H{a) . (4.2) 




Figure 4.2: Diverging graph with 3 nodes. 

Fig. (4.2) shows a "diverging" graph with 3 nodes. By the definition of CB 
nets, the joint probability P{a, b, c) of all the nodes of this net satisfies: 

P(a, 6, c) = P{b)P{a\b)P{c\b) . (4.3) 
The last equation implies the following entropic constraint: 

H{a,b,c) =H(b) + H{a\b) + H{c\b) 

= H{a,b) + H{c,b) - H{b) ' ^ ' 

which is equivalent to 
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H[{a: c)\b]=0 . (4.5) 
This means that at a fixed value of b, a and c are independent random variables. 

a 




Figure 4.3: Converging graph with 3 nodes. 
Fig. (4.3) shows a "converging" graph with 3 nodes. P{a, b, c) for this net must 

satisfy 

P(a, b, c) = P{a)P{c)P{b\a, c) . (4.6) 

Thus, 



Hia, b, c) = H{a) + H{c) + H{b\a, c) 

= Hia) + H{c) - H{a, c) + H{a, b, c) 

which is equivalent to 



(4.7) 



H{a : c) = . (4.8) 
This means that a and c are independent. 



a b c 

Figure 4.4: Three node Markov chain. 

A Bayesian net consisting of a simple chain of nodes connected by arrows 
all pointing in the same direction will be called an N node Markov chain. If the nodes 
are labelled by random variables q^,q^,---,qj^, we will denote the net by ^ ^ 
■ ■ ■ ^ Q'jy Fig. (4.4) shows a 3 node Markov chain a ^ b ^ c. P{a,b, c) for this net 
must satisfy: 

P(a, b, c) = P{c\b)P{b\a)P{a) . (4.9) 

Thus, 
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H{a, b, c) = H{c\b) + H{b\a) + H{a) 
= H{c,b) - H{b) + H{b,a) 



which is equivalent to 



H[{a:c)\b] = 



(4.10) 



(4.11) 



Note that Eq.(4.11) for the Markov chain Fig. (4.4) is the same as Eq.(4.5) for the 
diverging graph Fig. (4. 2). This shows that two CB nets with different topologies can 
have the same entropic constraint. 




b 



Figure 4.5: Fully connected 3 node graph. 



Fig. (4.5) shows a fully connected 3 node graph. P{a,b,c) for this net must 

satisfy: 

P(a, b, c) = P{c\b, a)P{b\a)P{a) . (4.12) 

Because the graph is fully connected, Eq.(4.12) is a tautology: it is satisfied by all 
probability distributions P{a,b,c). Eq.(4.12) implies 



H{a,b,c) = H{c\b,a) + H{b\a) + H{a) . 



(4.13) 




Figure 4.6: Fully connected 4 node graph. 
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Eq.(4.13) can be easily generalized to any number > 2 of nodes. Consider a 
fully connected CB net with nodes labelled by the random variables x^, X2, . . . , x^. 
Fig. (4.6) shows the case = 4. By the definition of CB nets, the joint probability 
of all the nodes must satisfy: 



Thus, 



N 

P{xi,X2, ■ . .,xn) = n • • • ,^2,xi) . (4.14) 



N 

H{xi,X2, . . . ,xjv) = ^H{xj\xj_i, . . . ,X2,Xi) . (4.15) 
Consider a 3 node Markov chain Q-^ ^ ^ q^- We shall demonstrate that: 
= ^(gj^i) < H{q^\l^ < H{q^\q^) , (4.16) 



and 



Hig,) = H{q^ : £,) > H{q^ : q^) > H{q^ : q^) . (4.17) 

Eqs.(4.16) and (4.17) will be called fixed sender (or speaker) data processing (DP) 
inequalities. Eq.(4.16) tells us that the entropy of q^ increases as "time" increases, 
because the "memory" q. of q^ becomes a progressively less faithful representation of 
the original. Eq.(4.17) tells us that the dependency of q. on q^ decreases as "time" j 
increases. Alternatively, one might say that the amount of information transmitted 
from q^ to q. decreases as the "distance" j increases. The farther away the receiver is 
from the sender, the less information it gets. Eq.(4.17) follows trivially from Eq.(4.16) 
Just subtract H{q^ from each term of Eq.(4.16) and multiply the whole string of 
inequalities by —1. To prove Eq.(4.16), we begin by noticing that 

P(a\a -Pfel'?2)Pfekl)-Pfa) _ -PfekOPfa) _p, ,,1 



This just means that once q^ is known, finding out q^ adds nothing new to our 
knowledge of q^. Eq.(4.18) implies 



HiQM2^1,)=Hiq^\q^). (4.19) 
Using the last equation and strong sub-additivity, we obtain 

QED. 

The Markov chain q^ ^ q^ ^ q.^ also satisfies 
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< Hiq^\q,) , (4.21) 

and 

HiOs ■■ I2) ^ H(l3 ■ li) ■ (4.22) 
Eqs.(4.21) and (4.22) will be called fixed receiver (or listener) data processing (DP) 
inequalities. As in the fixed sender case, Eq.(4.22) follows trivially from Eq.(4.21). 
Just subtract H{q^) from each term of the inequality and multiply by —1. To prove 
Eq.(4.21), we first realize that the method employed in Eq.(4.18) can be used to show 
that 

P{q3\q2,qi)=P{q3\q2) ■ (4.23) 

Whereas in the fixed sender case, Eq.(4.18) told us that we need only condition on 
the closest of the later times, Eq.(4.23) instructs us to condition only on the closest 
of the earlier times. Eq.(4.23) implies 

HiqM2^q,)=H{q^\q^). (4.24) 
Using the last equation and strong sub-additivity, we obtain 

Hiq^\q,) = Hiq^\q^,q^) < H{q^\q^) . (4.25) 

QED. 

Eqs.(4.17) and (4.22) can be stated simultaneously as 

Hiq^ : £3) < mm{Hiq^ : q,) , H {q^, q^)} . (4.26) 
Consider the 4 node Markov chain q-^ ^ q^ ^ q.^ ^ q^- Then 

Hig, : £,) < Hiq^ : q^) . (4.27) 

This follows from 

H{q^ : q^) < H{q^ : £3) < H^q^ : £3) , (4.28) 

where we have used the fixed sender DP inequality first and the fixed receiver DP 
inequality second. 

It is also interesting to note that the fixed receiver and fixed sender DP 
inequalities are related by time reversal. Indeed, suppose we are given a 3 node 
Markov chain q^ q^ q.y Then we can extend it to a 5 node Markov chain 
— > ?3 ^ q'2 ~^ need to define the set of states and the transition 

matrices for nodes g'^ and g'^. Suppose we do this as follows: 

5,, = S,^ , (4.29a) 
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(4.29b) 



Pig, 



(4.30a) 



Q2\q. 



Eqi,g2^(gi,g2,g3) ' 



qi) = P{q^ = gi|£2 = q-i) 



Eq,P{qi,q2,q3) 



(4.30b) 



qi\q. 



Y.q^,q^P{qi,q2,qz) ' 



where 



P{qi,q2,q3) = P{q3\q2)P{q2\qi)P{qi) ■ 



(4.31) 



Then, applying the fixed sender DP inequahty leads to the fixed receiver one: 



Can the DP inequalities, which are reminiscent of the Second Law of Ther- 
modynamics, be generalized easily and naturally to Bayesian nets more complicated 
than merely Markov chains? Such a generalization could turn out to be very useful. 
After all, the Second Law of Thermodynamics is an extremely useful result. See [12] 
for a generalization. 



(4.32) 
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5 Reduced Density Matrices 



In preparation for the next section, we will show in this section how to use a density 
matrix to generate a new, "reduced" density matrix. The Hilbert space acted upon 
by the reduced density matrix will have smaller dimension than the Hilbert space 
acted upon by the progenitor density matrix. 

Recall that a density matrix is an operator p acting on a Hilbert space Ti. In 
addition, p must be a Hermitian operator with unit trace and non-negative eigenval- 
ues. An operator with non-negative eigenvalues is called a non-negative ( or positive 
indefinite) operator. Note that if a is a Hermitian operator that acts on a Hilbert 
space ?i, then a has non-negative eigenvalues iff (0|cr|0) > for all |0) G Ti. This is 
why. Let's represent a by a matrix and the elements of Ti by column vectors. Matrix 
a can be expressed as a = UA.U\ where ?7 is a unitary matrix and A is a diagonal 
matrix whose diagonal entries Aj are the eigenvalues of a. If is any vector in Ti, 
and Vi are the components of vector v = t/V) then 

</)W = ^K;,|2A, . (5.1) 

i 

From the last equation, it is clear that > for all \(j)) G iff Aj > for all i. 

For any operator a acting on Tia and for which tra cr 7^ 0, it is convenient to 
define the normalizing function M{(t) by 

Now suppose that p is a density matrix acting on Tia ® 'Hb 1 and tTq, is a 
projection operator (tt^ = vTa) acting on Tia. Let 

K = tla^b {-T^aP) ■ (5.3) 

If we define 

\(t>ab) = {T^a\a))\h) (5.4) 

for all a E Sa and b G Sb, then 

K = T.i'PM(pab) > . (5.5) 

a,b 

When i^' 7^ 0, we can define the reduced density matrix red,ra (p) by 

red,^ (p) = ^^[tra (vr.p)] = K'hia (M • (5.6) 

Note that red^^ (p) is indeed a density matrix. Clearly, it is Hermitian and it has 
unit trace. Furthermore, for any \j3) G Tib, if we define 
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\Xap) = {Tia\a)m (5.7) 

for all a G Sa, then 

(/3|red,^ = R-^ Y.{XaMXap) > . (5.8) 

Some possibilities for tTq, are: 

(a) tTq = 1. Then 

red^^ p = tr^ p . (5.9) 

Note that tr^ (UpW) = tr^ (p) for any unitary matrix U acting on Ha, However, 
for other TTa's, it may happen that redjr^ (UpW) ^ red^^ (p). Thus, although 
not true for tr^ (■), red^^ (■) may depend on the basis used to evaluate it. 

(b) tTq = I a) (a I, where \a) G Ha. Then 

red., p = . I /, , . 5.10 

If a, a' G Sa, then some possibilities for \a) are |a), :^(|a) + |fl')), and \Av{a)), 
where 

|v4^;(a)) = ^ ^ |a) . (5.11) 

We will call \Av{a)) the average of the a basis. 
Define 

^a = |v4i;(a))(Ai;(a)| , (5.12) 

K = {Av{a)\tib ip)\Av{a)) . (5.13) 

If 7^ 0, we can define the entry sum ESq (p) o/ p in the a basis by 

EE„ (p) = red^^ (p) . (5.14) 

Thus, 

EE, (p) = Ar[tr, (Eap)] = K-'{Av{a)\p\Av{a)) . (5.15) 
ESq (p) is called an entry sum because it can be expressed as 

EE, (p) =Ar(^(ai|p|a2)) , (5.16) 

ai,a2 

where the sum is over all ai G Sa and 02 G Sa- 
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6 Density Matrices Associated with a QB Net 



In this section, we will describe a method for constructing many different density 
matrices associated with a single QB net. 

Consider a QB net M'^ with nodes labelled by the random variables x^, X2, . . . 

We will consider density matrices which act on H(x.)p, where F is a subset of 
Zi^N- We will use r(p) to represent the V of density matrix p. 

Let A{x) be the amplitude assigned by A/'*' to story x. . Assume that (see 
Appendix A) 

Y^\A{x.)\' = l. (6.1) 

X. 

Then we can define the meta state-vector \iprneta) and the meta density matrix fi of 
7V« by 



|^^„^e^a) = E^(^•)k•) , (6-2) 

X. 

At = \lpnieta) {i>nieta\ ■ (6.3) 

(Eq.(6.1) guarantees that \ipmeta) has unit magnitude.) For example, if Af^ has 3 
nodes a, 6, c, then 



a,b.c 

fi = Y,A{a,b,c)A*{a',b',c')\a,b,c){a',b',c'\ . (6.5) 

ri 

Note that in Eq.(6.2) represents a ket in the Hilbert space T~Cx. — T~(-x^ ® 
Tix^ ® ■ ■ ■ ® 'Hx^- This is not the conventional use of a tensor product of Hilbert 
spaces. In Quantum Mechanics, such products are conventionally used to represent 
a "system" described by Tix. which consists of A^ "subsystems" such that the i'th 
subsystem is described by Hx . {xi might correspond to the position and X2 to the 
spin of the same particle, so the two subsystems may be associated with the same 
particle.) In our usage, the spaces Hx correspond to the nodes of a QB net. They 
need not correspond to separate subsystems. They might, for example, correspond 
to the same subsystem at two different times. 

Because it acts on this unusual Hilbert space, the meta density matrix fi is 
unconventional. So why use it? Because it is uncontestably a density matrix in the 
formal sense (Hermitian, unit trace, non- negative.) Furthermore, as we shall see in 
what follows, fi proves to be a very useful tool for discussing QB nets. The reason why 
fi is so useful is not hard to see. n is a. vast storehouse of information about its QB 
net A/"*^. In fact, it stores the amplitude of all the Feynman stories of A/"*^. Applying 
to II one or more red () operators of the type discussed in Section 5, we can generate 
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many different reduced density matrices, all pertaining to tlie same QB net . For 
example, for a QB net with 10 nodes, we might consider ES^. tr^, ( 

Suppose a is one of the nodes of the QB net, and consider red^„ /i for 
various tTq. 

(a) tTq = \a){a\ for some a G Sa. Then red^^ /i = A/'((a|/i|a)). This corresponds 

to an experiment in which node a is measured, and found to have a particular 
value a. The experiment is run repeatedly, and those runs for which a are 
rejected. 

(b) tTq = 1. Then redjr^ = tr^ /i. This corresponds to an experiment in which 

node a is measured without any expectations as to the value obtained. The 
experiment is run repeatedly. We sum over the various outcomes of the a 
measurement. 

(c) tTq, = \Av{a)) {Av{a)\. Then redTr^ /i = ESq /i. This corresponds to an experiment 

in which node a is NOT measured. 

Suppose p is a density matrix obtained by reducing a meta density matrix /i, 
and suppose p acts on '^^(x.)r(p)- ^'^y node a in (a;.)r(p) will be said to be uncommitted, 
neither measured nor unmeasured. Any node a in {x.)z^ N-'r(p) ^i^^ ^e said to be either 
measured or unmeasured. It is unmeasured iff to go from to p, one of the reductions 
we performed was red^rj, = ES^ as in case (c) above. If node a is measured as in 
case (b) above (i.e., red^^ = tra), we will say that it has been measured passively. 
We describe this measurement as passive because it does not involve data rejection 
by the observer like ) above. 

Note that external nodes are always measured. If an observer does not mea- 
sure them, they are still measured passively by the environment. Thus, if a is an 
external node, then EEq, (/i) cannot be realized physically because ES^ (p) describes 
a situation in which a is not measured. 

Suppose Pout is obtained by e-summing p over all internal nodes of the graph: 

Pout = ^^ix.)z^„, if^) ■ (6.6) 
Then pout is a pure state. Here is why. Define 



Y^A{x.)\{x.)z^J . (6.7) 



Now note that 



{m= E 



E M< 



2 



1, (6. 



and 
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IV')(^I = T.llA{x.)A\x:)\{x.)z^J{{x:)z^J = Pout . (6.9) 

X. X.' 

QED. 




b 

Figure 6.1: Fully connected 3 node graph. 

To illustrate the definition of pouti consider Fig. (6.1), which shows a fully 
connected 3 node graph with nodes a, 6, c. Nodes a, h are internal and c is external. 
The /i for this net is given by Eq.(6.5). Define pout by 

Pout = ESa,b p . (6.10) 

If 

|^) = ^A(a,fe,c)|c), (6.11) 

a,b,c 

then 

\^){^\ = Y.Aia,b,c)A*ia\b',c')\c){c'\ = pout ■ (6.12) 

all 

Pout corresponds to a situation in which none of the internal nodes are measured 
and all the external ones are uncommitted. We will say that a density matrix has 
maximum internal coherence if it corresponds to a situation in which none of the 
internal nodes are measured, pout has maximum internal coherence. Reduced density 
matrices obtained by reducing pout also have maximum internal coherence. 
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7 Probabilities Associated with a QB Net 



In this section, we will define various probability distributions associated with a QB 
net. 

Consider a QB net M'^ with nodes labelled by the random variables x^, X2, . . . 
Let A{x) be the amplitude assigned by to story x. . Suppose F is a non-empty 
subset of Zi^N. The probability of observing (x.)r to have a value of {x.)y is 



P[{x.) 



E(y.)r X[{y-)r] 



(7.1) 



where 



E 



X. 



(7.2) 



In Eq.(7.2) we sum the amplitudes over all internal nodes except those in F, then we 
take the magnitude squared, then we sum that over all external nodes except those 
in F. We can express P[(x.)r] in terms of the meta density matrix of the QB net: 



P[(x.) 



((x.)rl ES(2,,)^^^^_^ /ij l(x.)r) . 



(7.3) 



Thus, P[(x.)r] corresponds to a situation in which the nodes in F are projected to a 
single state, those in Zext — T are passively measured, and those in Zint — F are not 
measured at all. Note that 

E ^[(^Or] = 1 , (7.4) 

ix.)r 

as required for a probability distribution. However, if F and F' are non-empty disjoint 
subsets of Zi^N, then it is possible that 

E P[(x.)rur'] ^ P[{x.)r] ■ (7.5) 

(X.)p, 

To illustrate the above definition of P[(x.)r], consider the 3 node Markov chain 
a — > 6 — ^ c. Assume node a has amplitudes ipai where IV^aP = 1- Node h (ditto, 
c) has amplitudes Uha (ditto, Vcb), where Uba (ditto, Vch) are the entries of a unitary 
matrix. Then 



^(«) = E 



m = E 



E "^cbUbalpa 



= M , 
E Uba'4'a 



(7.6) 

(7.7) 
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Pic) 



VcbUbalpa 



(7.8) 



P{b,c) 



iT.aVcbUbai'al 



Note that 



but 



Eb,c iJ^aVcbUba-ipal 

P{a,b,c) = IVcbUhai'af ■ 



j:p{b,c) = i 

b,c 



J2Pib,c)^P{c). 



(7.9) 

(7.10) 
(7.11) 

(7.12) 



We can define conditional probabilities using the unconditional ones P[(a;.)r] 
defined above. Suppose Fi and r2 are non-empty disjoint subsets of Zi^n- The 
conditional probability P[(a;.)ri|(a;.)r2] of observing (x.)ri to have a value of (x.)ri, 
given or conditioned upon the fact that (x.)r2 is known to have the value {x.)^^, is 



P[{x.)r,\{x.)r,] 



where the denominator of this expression is defined by 
Note that 



(7.13) 



(7.14) 



(7.15) 



Y P[(x.)rJ(x.)rJ = l. 

(^■)ri 

However, if Fi, r[ and r2 are non-empty disjoint subsets of Zi^n, then it is possible 
that 



Y P[{x.)r,ur[\{x.)r,]^P[{x.)rAix.)r,]. 

(X.)p, 



(7.16) 



To illustrate the definition of P[(x.)ri | {x.)r2], consider again the 3 node Markov 
chain a b ^ c. One has 



P{a,b\c) 



P{a, b, c) 
P{a,b){c) 



(7.17) 
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where 



P{asic) = T.Pi<'^b,c) . (7.18) 

a,b 

Note that 

Y,P{a,b\c) = l, (7.19) 

a,b 

but 

Y^P{a,b\c)^P{b\c). (7.20) 

a 

We can easily extend the definition Eq.(7.13) of P[(x.)rJ(x.)r2] to the case 
that Fi and T2 overlap. We simply equate P[(x.)rJ(a;.)r2] to P[{x.)ri-r2\{^-)r2]y 
evaluate the latter with definition Eq.(7.13). For example, for a QB net with nodes 
QL,h.,c, P[{a,b)\{b,c)] = P[a\{b,c)], and the right-hand side can be evaluated with 
Eq.(7.13). 

Given any density matrix associated with the QB net A/"*^, it is natural to 
define a probability distribution with its diagonal entries. Suppose p is a density 
matrix that acts on the Hilbert space '^^(x.)r(p)' suppose F is a non-empty subset 
of F(p). We define 

Pp[(x.)r] = ((x.)r|tr(^.),,^,_, (p)l(x.)r) • (7.21) 

In the last equation, we trace p over all nodes except those contained in F, then we 
take the diagonal entries of the resulting operator. Note that 

E ^p[(^-)r] = 1 . (7.22) 

(x.)r 

Furthermore, if F and F' are non-empty disjoint subsets of F(p), then 

E ^p[(^-)rur'] = Pp[ix.)r] • (7.23) 

(x.)r' 

We can describe the last result by saying that the family of probability distributions 
{Pp[(x.)r]|F C F(p)} is closed under marginalization. We saw previously that the 
family {P[(x.)r]|F C Zi^n} does not possess this closure property. 

To illustrate the definition of Pp[(x.)r], consider a density matrix p which acts 
on 'Ha,b,c- Then 

P,{b,c) = {b,c\tTa{p)\b,c) , (7.24) 
Y,P,{b,c) = l, (7.25) 

6,c 
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Y.Ppib,c) = {c\tia,b (p)|c) = P,(c) . (7.26) 



b 

Note that for any probability distribution P[(x.)r], we can find a density matrix 
p such that 

P[ix.)r] = Pp[(x.)r] . (7.27) 

Indeed, just set 

P = tr(x.)....-r [ES(..)z„,-r iP)] ■ (7.28) 

Suppose A/"*" is the parent CB net of Af^. Suppose fi is the meta density 
matrix of Af^. Then for any T C ^i,iv, P^[(x.)r] of Af^ is identical to P[(x.)r] of 
A/"*". For example, if Af^ had nodes a,b,c and amplitudes A{a, b, c), then P^(a, 6, c) 
for A/"*^ and P(a, 6, c) for A/""^ both equal |A(a, 6, c)p. Likewise, P^(a, fe) for Af^ and 
P(a, 6) for AA^ both equal Ec |A(a, 6, c)|2. 

We can define conditional probability distributions using the unconditional 
ones Pp[(x.)r] defined above. Suppose Fi and F2 are non-empty disjoint subsets of 
F(p). Then we define 

Note that 

Y: Pp[(x.)rJ(x.)r,] = l. (7.30) 

{^■)ri 

Furthermore, if Fi, r[ and F2 are non-empty disjoint subsets of F(p), then 

Y: Pp[(x.)r,ur'J(x.)r.] = Pp[(x.)rJ (x.)r.] • (7.31) 

(x.)r'^ 

To illustrate the definition of P[(x.)ri | {x.)r2\ , consider a density matrix p which 
acts on 'Ha,b,c- Then 

YP,{a,b\c) = l, (7.33) 
EP,(a,&|c) = P,(6|c). (7.34) 

a 

We can easily extend the definition Eq.(7.29) of Pp[(x.)ri |(a;.)r2] to the case 
that Fi and F2 overlap. We simply equate Pp[(x. )ri |(x.)r2] to Po[(a;. )ri-r2 1 (^•)r2]) 
and evaluate the latter with definition Eq.(7.29). 
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8 Quantum Entropy: Its Definition and Proper- 
ties 

In this section, we will define various quantum entropies associated with a QB net. 
The von Neumann quantum entropy of a density matrix p is defined by 

5(p) = -tr(plog2p). (8.1) 

When p is related to a QB net, it is convenient to rephrase Eq.(8.1) in terms of 
the node random variables of the net. Consider a QB net Af^ with N nodes labelled 
by the random variables Xi,X2, . . . ,Xjsf. Suppose p is a density matrix that acts on 
the Hilbert space 'H(x.)r{p)y and suppose F, Fi and F2 are non-empty subsets of F(p). 
Fi and F2 need not be disjoint. We define: 

Sp[ix.)r] = ^[tr(..)p(,)_P (P)] , (8.2) 
Sp[{x.)r,\{x.)r2] = Sp[{x. )r,ur2] - Sp[{x.)r2] , (8.3) 

Sp[{x.)r, : (x.)r,] = 5,[(x.)rJ + Sp[{x.)r,] - 5,[(x.)r,ur.] • (8.4) 

For example, suppose a,b,c are nodes of a QB net. If p is a density matrix which 
acts on TCa, then 

Sp{a) = S{p) . (8.5) 

If instead, p acts on Ha,b,c, then 

Sp{a) = S{tib,c p) , (8.6) 

Sp{a,h) = S{ii,p) , (8.7) 

Sp{a\h) = Sp{a,h)-Sp{h), (8.8) 

Sp{a : h) = Sp{a) + Sp{b) - Sp{a, h) . (8.9) 

Eqs.(8.2) to (8.4) for the quantum entropy Sp{-) are very natural generalizations of 
Eqs.(3.2) to (3.4) for the classical entropy if(-).[13] 

Note that definitions Eqs.(8.2) to (8.4) are independent of the order of the 
node random variables within (a;.)ri and (a;.)r2- For example, if p is a density matrix 
acting on 'Ha,b,c, then 

Sp{a,b,c) = Sp{a,c,b), Sp[a\{b,c)] = Sp[a\{c,b)] . (8.10) 
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It is convenient to extend definitions Eqs.(8.2) to (8.4) in the following two ways. 
First, we will allow (x.)ri (ditto, (x.)r2) to contain repeated random variables. If it 
does, then we will throw out any extra copies of a random variable. For example, if 
p is a density matrix acting on Ha,b,c, then 

Sp{a,a,b,c) = Sp{a,b,c), Sp[a\{b,b, c)] = Sp[a\{b,c)] . (8.11) 

Second, we will allow (x.)ri (ditto, (x.)r2) to contain internal parentheses. If it does, 
then we will ignore the internal parentheses. For example, if p is a density matrix 
acting on Ha,b,c, then 

Sp[{a,b),c] = Sp{a,b,c), Sp[a\{{b, c), d)] = Sp[a\{b, c, d)] . (8.12) 

Let 2L = (2i-)ri5 H = {x-)r2 Zl = (^lOrs, where the ri,r2,r3 are non- 
empty, possibly overlapping, subsets of Z^^n- As with the function -ff(-), we will 
extend further the domain of the function <S'p(-) by introducing the following axioms 

^p[(Z,Z) : Z\ = Sp[{X : Z), {Y : Z)] , (8.13) 

Sp[{X : Y),Z] = Sp[{X,Z) : {Y,Z)] . (8.14) 

Table 3 gives a list of properties (identities and inequalities) satisfied by the 
quantum entropy Sp{-). Whenever possible. Table 3 matches each property of the 
quantum entropy Sp{-) with an analogous property of the classical entropy H{-). 
Analogous properties are indicated hj H ^ Sp. See Refs.[l]-[9] to get proofs of those 
statements in Table 3 that are not proven in this paper. 

An identity satisfied by S{-) but with no classical counterpart is: 

SiUpU^) = S{p) , (8.15) 

for any unitary matrix U acting on the same Hilbert space as the density matrix p. 
We say that S{-) is invariant under unitary transformations of its argument. Next 
we will rephrase Eq.(8.15) in terms of the node random variables of a QB net. Let 
Fi and F2 be disjoint sets whose union is F(p). Define Xi = (x.)ri, X2 = {x.)r2, and 
X = (x.)r(p). Thus, X = {Xi,X2). p acts on Tix so we can express it as: 

p = J2\X) Px,x' {X'\. (8.16) 

ri 

Suppose U acts on Then 

UpW = E„ \Y)\X2)Uyx, PixM,ixi,x^) Uj,,y,{X^\{Y'\ 

= Er^\MXl))\X2)p^X.,X2UXlX^){X^\{MX[)\ ' ^' ' 

where 
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\^l:y{X,))=Y,\Y = Y)UYx,. (8.18) 

Y 

The Hilbert space Hy_ has the same dimension as Hx^^. The vectors \ipY_{Xi)) G Tiy 
are orthonormah 



{^Y{Xi)\ijY{X[)) = 6{X,,X[] 



119) 



Thus, 



U=l 



(8.20) 



Suppose 2L = {x-)t for some non-empty set T C r(p). The matrix tT(^x.)r^p^_r (p) 
used in definition Eq.(8.2) of Sp{20 has diagonal entries which are the probabihties 
Pp{X) defined in Section 7. It is convenient to define a classical entropy for the Pp{X) 
distribution: 



HpiX) = - ^ P,(X) log2 Pp{X) . (8.21) 

X 

Because the probability distributions Pp{X) are closed under marginalization, Hp{-) 
satisfies all the identities and inequalities (see Table 3) satisfied by the classical en- 
tropy H{-). 

It follows from Table 3 that 



< Sp{X) < Hp{X) . (8.22) 

Thus, Hp{2C) is a useful upper bound on Sp{20- 

The quantities Hp{2Q and Sp{2C) complement each other in what they tells us 
about p and X. Indeed, note the following. Suppose 2L= (^lOr where T C r(p). Let 
P' = t^fe-)r(p)-r (P) ^iid ^ = {(^■)r\p'\{x.)r) so that 

Sp{X) = -tT{M log, M) , (8.23) 



Hp{X) = -Y.Mu log2 Mu . (8.24) 

i 

M is a diagonal matrix iff Sp{2Q = Hp{2C). Knowing Sp{2Q alone does not tell us 
if M is diagonal because tr(M log2 M) is invariant under unitary transformations of 
M. 

Henceforth, we will refer to the quantity 

Qp{X) = Hp{X) - Sp{X) (8.25) 
as the coherence of X_ in p. Note that 

< Qp{X) < log2 Nx . (8.26) 
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One has Qp{20 = (i.e., zero coherence) iff Hp{]C) = S'p(X), which is true iff 
M is diagonal. One has Qp{2L) = log2-^x (i-e., max. coherence) iff Sp{2L) = 
and Hp{20 = log2iVx. Sp{2Q = iff there exists some column vector v such that 
M = vv"^. Hp{2C) = log2 Nx_ iff the diagonal entries of M are all equal. In fact, at 
max. coherence, all the entries of M have the same absolute value 1/Nx_- 

Qp{2Q = iff p' = ^T^{x.)ri^p)-r (p) is diagonal in the X-basis {|(x.)r)}. Hence, 
Qp{2Q can also be interpreted as the mismatch between p' and the X_ basis. At zero 
mismatch, the X basis constitutes a set of eigenvectors of p'. 
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9 Mixed States and Purification 

In this section, we will show how any mixed state density matrix can be represented 
by a QB net. 




Figure 9.1: QB net for a mixed state. 
Consider the QB net of Fig. (9.1), where 



nodes 


states 


amplitudes 


comments 


i 


j = (jl,j2) 


"i 


|ajf = 1 




Q 






L 


r 







The meta density matrix /x for this net is 

where 

li^meta) = I^ttgr|j = {q, r))\q = q)\r = v) . 

ri 

Define a and by 

a = ESj (/i) = X]ag,.a*v|g,r)(g',r'| , 

ri 

aq = tir (cx) = J2aqral,^\q){q'\ . 

ri 

Clearly, cr is a pure state and aq is a mixed one. Since cr is a pure state. 

By the Triangle Inequality (see Table 3), 

Sa{q) = Sa{r) . 
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(9.1) 
(9.2) 

(9.3) 
(9.4) 

(9.5) 
(9.6) 



We've shown that some mixed state density matrices can be represented by 
a QB net. But can any mixed state density matrix be represented in this manner? 
Yes. This is why. Suppose p is 

P = E/5..'k)(5'l- (9-7) 

ri 

Then the complex numbers jSqqi define a Hermitian matrix /3. One can always decom- 
pose (5 into (3 = UrW , where f/ is a unitary matrix and F is a diagonal matrix. If 
we let a = U \/T, then 

P = aa^ . (9.8) 

Thus, 

P = II«qra*v|g)(g'| • (9.9) 

ri 

QED. The state 

\^) = T.(^Mr) (9.10) 

ri 

is called a purification of p, because the mixed state p can be obtained from the pure 
state Ixp) as follows: 

tr, |^)(^| = p. (9.11) 
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10 Quantum System Interacting with Environment 



In this section, we will consider QB nets that represents a quantum system interacting 
with its environment one or more times. 



10.1 Single Interaction 






Figure 10.1: QB net for a system interacting once with its environment. 
Consider the QB net of Fig. (10.1), where 



nodes 


states 


amplitudes 


comments 


I 


J = ij\f) 




|ajf = 1 
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Let A/"*^ be the QB net which contains all the nodes shown in Fig.(lO.l). Let 
A^o^ be the sub-net which contains only nodes j, q^,r. 

The meta density matrix /iq of Af^ is 



(10.1) 
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where 

\^l,ta) = Y.(^,r\l={q.r),q,r) . (10.2) 

ri 

Define po by 

po = EEj /io = ^aqral,^,\q,r){q',r'\ . (10.3) 

ri 

Po is a pure state so 

Sp,{r,q) = 0. (10.4) 
The meta density matrix p of A/"*^ is 

P = I'ipmeta) {'ipmetal , (10.5) 

where 

li^meta) = J2 ^ (qj , ef\q, c) /3eagr\i = {q, v) , q, T, 6, t = {qf,ef),qf,ef) . (10.6) 

ri 

Define p by 



P = E%£,e,t /i = X]f/(g/,e/|g,e)/3eag^[/*(g},e'^|g', eO/^e'tt^vl^ qf^<^f){r',q'f,e'f 

ri 

p is a pure state so 



(10.7) 

5'p(r,g^,e^) = 0. (10.8) 



By virtue of sub-additivity, 

Sp{r, ej) - S,{ef) = S,{r\ef) < S,ir) . (10.9) 
By Eqs.(10.4) and (10.8) and the Triangle Inequality, 

5p(r,e^) = 5,(gp, (10.10) 

SpiD = S,,{r) = S,,{q) . (10.11) 
Hence, Eq.(10.9) can be written as[14] 

Spiq.)-S,iej)<S,M. (10.12) 
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Figure 10.2: QB net for a system interacting twice with its environment. 



10.2 Multiple Interactions 

Consider the QB net of Fig. (10. 2), where 
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Let J\f^ be the net which contains only nodes j, q,r. For r G Z12, let A/"^ be 
the net which contains the previous net M^_i plus nodes e^,^,-, q^^^e^f. 
For r G ^0,2? the meta density matrix /i^ of net is 

/^r = |V'meta)(^me4ah (10.13) 



where 

ICeJ = E f n ^a) a(gi,r)|j = (gi,r),gi,r) , (10.14) 

all \A=1 / 

where 



Mx = Ux{qxf,exf\qx,ex)Px{ex)\ex,tx = {qxf,exf),qxf,exf) ■ (10.15) 
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Define for r G Zq 2 by 



Pr = EJ^xir) (Pr) , (10.16) 

wliere 2L{'t) represents all tlie internal nodes of Af!^. Tlius, po acts on 7Y(r,g ), pi acts 
on 'H(r^e-if,q^), and p2 acts on 'H(^r,e^j.,e^j,q,^)- For any r G ^0,25 Pr is a pure state so 

^po(i:,gi) = o, (10.17a) 

^Pi(r,ei^,g^P = 0, (10.17b) 

SpAL,eif,e2f,q^^) = . (10.17c) 
Weak and strong sub-additivity imply 

Sp2{r\eif,e2f) < Sp,{r\e^f) < Sp,{r) , (10.18) 
which, by virtue of Eqs.(10.17), can be written as[14] 

Sp2ig2f^ - Sp^{eif,e2f) < Sp,{q^^) - Sp.ie^f) < Sp^{q^) . (10.19) 
Define for all r G Zq^2 by 

ar = EExir) (fir) , (10.20) 

where 2L{t) now represents all the internal nodes of Af!^ except for q^,q^,q.y Thus, 
(To acts on ^(^,5^), c"i acts on '^^^.e^^.g^.g^)' "^2 acts on 'H(^r,e^^,e^j.,q^,q^,q )• Ncxt we 
will show that 

= S^^{q^\q^) < S^,{q^\q^) < S^^{q^\q^) , (10.21) 

which is a quantum counterpart of the classical fixed sender DP inequality Eq.(4.16). 
First note that 

'5<x2(2il£3) = '5'^2(£il22/) - '^'^2(£il£2/'^2/) , (10.22a) 
where we've used q = q and strong sub-additivity. Since S{-) is invariant under 
unitary transformations of its argument. 



Sa2{q^\q2f^^2f) = S^^{q^\q^ e2f 



U2 = l 



nil 



(10.22b) 



where nil equals S{J2riP2{€.2)P* {€'2)162) {e2\), which is zero. Combining Eqs.(10.22), 
we get 



5^2(11 kg) - '^^i(£il£2) • 
QED. For an alternative proof of Eq.(10.21), see [12]. 



(10.23) 
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11 Two Mixtures Interacting 



In this section, we will consider a QB net that represents two mixed states scattering 
once off each other. 




Figure 11.1: QB net for 2 mixtures interacting. 



Consider the QB net of Fig. (11.1), where 
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Let N'^ be the QB net which contains all the nodes shown in Fig. (11.1). For 
^ £ Zi,2j let Afx sub-net which contains only nodes j ^, q^.Lx- 

For A G ^1.2, the meta density matrix ^\ of is 

l^X=\^Lta){'^Lta\ , (11-1) 

where 



l^meta) = H «A(gA, ^a) = (^A, ^a) , ^A, ^a) • (11-2) 

ri 
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Define px by 



Pa = ESj^ Pa • (11-3) 
Pa acts on 'Hi^q^^r^) and it is a pure state so 

5p,(g,,r,) = 0. (11.4) 
The meta density matrix p of is 

P = |^^meta)(^meta| , (11-5) 

wliere 

ri \A=1 / 

(11.6) 

Define p by: 

p acts on TCr r„ a and it is a pure state so 
I _i i_2 f '2.9 f 



According to Table 3, 



Sp{r^,r2,q^^,q^^) =0 . (11.8) 



l^pilxf) - SpiZx)] < Sp{q^^,rx) < Sp{q^^) + Sp{rx) , (11.9) 
for A G ^1,2- By Eq.(11.4) and tlie Triangle Inequality, 

SpiLx) = S,,irx) = S,,iq^) . (11.10) 
By Eq.(11.8) and tlie Triangle Inequality, 

Spiq^^,r^) = Sp{q^^,r2) . (H-H) 
Hence, Eq.(11.9) can be rewritten as[15] 

\Sp{q^j) - Sp^{q^)\ < Sp{q^^,ri) = Sp{q^^,r2) < Sp{q^j) + Sp^{q^) . (11.12) 
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12 POM 



Given a Hilbert space Tig, a POM (Probability Operator Measure)[16] is a set {Fi,\b G 
S'b} of non- negative Hermitian operators acting on Tig. In addition, the observables 
Fh must form a "complete" set, meaning that 

T.Fb = l. (12.1) 

If p is a density matrix acting on the same Hilbert space Tig as the F^s, then we can 
define a probability distribution for the random variable b by 

P{b) = tr(pF,) , (12.2) 

for all b G Sb- We call an experiment that yields the value b for b with a probability 
P{b) a "generalized measurement" . 

We say that the F^s are (pairwise) orthogonal if F^Fb' = for all b, b' G Sb 
such that b ^ b'. If the F^'s are orthogonal, then we say that {Fb|Vfe} is an orthogonal 
POM . 

An operator F;, is said to have rank one if it can be represented in the form 
\ip){ip\, where \^|:) need not have unit magnitude. If does have unit magnitude, 
then Fb is a projector (i.e., F^ = Fb). An Fb which is projector is a pure state density 
matrix. For this reason, if the F^'s are all projectors, then we say that {Fb\ib} is a 
pure POM . 

A POM is both pure and orthogonal iff its F^'s are (pairwise) orthogonal 
projectors (i.e., FbFy = FbS{b, b') for all 6, b' G Sb). For such a POM, we can represent 
each Fb by where the |6)'s are an orthonormal basis of Tig. Eq.(12.1) then 

reduces to X^fe |^)(^| = 1- Such a POM is said to constitute a von Neumann or ideal 
measurement. 

In this section, we will show how to represent a POM as a QB net. Part (a) 
will assume that the F^'s are orthogonal projectors. Part (b) will not assume this. 

12.1 Orthogonal Projector F^'s 

Consider the QB net of Fig. (12.1), where 
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•bf 

Figure 12.1: QB net for orthogonal projector F^'s. 
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Suppose the unitary operator U satisfies: 



U\<p), ® |0)fe = Y.UFb ® \b)b , (12.3) 
b ^ ^ 

for any unit-magnitude vector G T^g. One can show that, for any POM {F^lVfe} 
acting on H^, there exists a unitary operator \J that satisfies Eq.(12.3). Note that on 
the right-hand side of Eq.(12.3), the state |6) acts as a pointer that points towards a 
particular choice of F\,. Note that the completeness of the F^'s and the unit-magnitude 
of |0)q together imply that the right-hand side of Eq.(12.3) is a unit-magnitude vector. 
The vector |0)q ® \^)b upon which \J acts is likewise a unit-magnitude vector. The 
fact that \J takes a unit-magnitude vector into another unit-magnitude vector (of the 
same dimension) is consistent with the unitarity of f/. 

Eq.(12.3) can be expressed in component form as follows: 
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j:U{qf,bf\q,b)<pM = T.^b{qf\qmq)St' (12.4) 

ri ri 

for any function (j){q). {4>{q) need not be normalized since it appears on both sides of 
the equation.) 

Let Af^ be the QB net which contains all the nodes shown in Fig.(12.1). Let 
J\f^ be the sub-net which contains only nodes j, q,r. 
The meta density matrix hq of Af^ is 

/^0=|V^Lta)(Ce*al, (12-5) 

where 

lCeJ = E«.r|j = (g,r),g,r). (12.6) 

ri 

Define 

Po = ESj tr^ /io = ^o;gra*g,^\q){q'\ . (12.7) 

ri 

The meta density matrix n of J\f^ is 

/i = Itpmeta) {tpmetal , (12.8) 

where 

iV'meta) = X] ^(^/' ^/ 1^' ^)"g^^ol J = il^ r) , q, r, b, t = {qf,bf),qf,bf) . (12.9) 

ri 

By Eq.(12.4), |^ 

meta 

) can also be expressed as 
iV'meta) = J2 '/^b{qf\q)aqrSl^ H = {q, r) , q, T, b, t = {q f , b f) , q f , b f) . (12.10) 

ri 

Define p by 

p = EJ:j_,q,b,t tr^,g^ /i . (12.11) 

In other words, we get p by tracing p over all the external nodes except bj, and 
e-summing it over all the internal nodes, p acts on Ti^^.. Using the fact that the FbS 
are orthogonal projectors, it is easy to show that 

P =EriFb{q'\q)aqra*g'r\bf = b){bf = b\ n9l9^ 

= J:r^triF,p,)\bf = b){bf = b\^ ■ ^'^■'^> 

Thus, 

{b\p\b) = tr(FfePo) . (12.13) 
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12.2 General F^'s 




Figure 12.2: QB net for general F^s. 



Consider the QB net of Fig. (12. 2), where 
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This is the same as the table in Section 12.1, except that there are two new 
nodes {x,Xj), and the states of node t have 3 components instead of 2. 

Instead of Eq.(12.3), we now suppose the unitary operator U satisfies: 



U\<P), ® \0)h ® |0), = ^ (^/Fi 10) J ® \b)b ® 1% , (12.14) 

b ^ ^ 

for any unit-magnitude vector |0)g G Tig. 

Eq.(12.14) can be expressed in component form as follows: 
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Ef/(g/,6^,x/|g,fe,x)0(g)5^5o' = V^6(g/|g)</'(g)5r'5,^' • (12.15) 

ri ri 

Let be the QB net which contains all the nodes shown in Fig.(12.2). Let 
Mq be the sub-net which contains only nodes j, g, r. 
/io and po are defined as in Section 12.1 above. 
The meta density matrix ^ of is 

/i = \i^meta) {'ipmetal , (12.16) 

where 

l^meta) = {qf ,h f , X f\q,h, x)aqr5Q5Q\]_ = {q,r),q,r,b,x,t= {qf,bf,Xf),qf,bf,Xf) . 

ri 

(12.17) 

By Eq.(12.15), \4j 

meta 

) can also be expressed as 
li^meta) = \/^b{q f\q)aqX^ H = {q, v) , q, r, b, X , t = {qf,bf,Xf),qf,bf,Xf) . 

ri 

(12.18) 

Define p by 

p = E^l,q,b,x,t ^^r_,q^,Xf 1^ ■ (12.19) 

In other words, we get p by tracing p over all the external nodes except bf, and 
e-summing it over all the internal nodes, p acts on Tib^ . It is easy to show that 

p = J2 tr (Ffcpo) \bf = b){bf = b\. (12.20) 

ri 

Thus, 

{b\p\b) = tr(F,po) . (12.21) 

Whereas in Section 12.1, the orthogonal projector property of the F{,'s "forces" p to 
be diagonal, in this section, it is the tracing over node Xj, a passive measurement of 
that node, which forces p to be diagonal. 
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13 Signal Ensembles 



Suppose {wa\a G Zi^j^} is a collection of non-negative numbers which add up to one. 
Suppose {pa\a G Zi]\f} is a collection of density matrices all acting on the same 
Hilbert space H. Let 

P = ^WaPa- (13.1) 

a 

We will say that p is a weighted sum of density matrices. We will call the collection 
S = {{wa, Pa)\^o,} a signal ensemble. We will call the WaS the weights of S and the 
Pa's the signal states or signals of S. 

In Quantum Information Theory, one is often interested in density matrices 
like p and ensembles like S. One envisions sending a message encoded as a string (for 
example: pi, ps, pa, pi) of signal states. (It is assumed that the states in the string are 
separated in some way, perhaps by intervening idle time periods.) To say something 
about the average behavior of such messages, one needs to consider p and S. 

We'll say two signals are orthogonal if paPb = for a 7^ 6. A signal en- 
semble such that all its signals are mutually orthogonal will be called an orthog- 
onal ensemble. Orthogonal ensembles play a special role in Quantum Information 
Theory, since their signals are perfectly distinguishable (by a generalized measure- 
ment with Fh = Pb/(J2b' Pb')- Suppose we are given a non-orthogonal signal ensem- 
ble {(w^a, Pa)|Va}. Then we can always replace it by an orthogonal one. Indeed, if 
{|a)|a G Zi^n} is an orthonormal basis for some Hilbert space different from the one 
on which the p^'s act, and we define 

aa = \a){a\pa , (13.2) 
for all a, then the ensemble S' = {{wa, (7a)\Wa} is orthogonal. Let 

a = '^Waaa = '^Wa\a){a\pa . (13.3) 

a a 

Note how in a, each projector \a){a\ acts as a pointer that points towards a particular 
choice of pa- We will say that p of Eq.(13.1) (ditto, a of Eq.(13.3) ) is a weighted sum 
of density matrices with scalar weights (ditto, orthogonal projector weights). Next, 
we will show how both p and a can be represented by a QB net. 

13.1 Scalar Weights 

Consider the QB net of Fig. (13.1), where 
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Figure 13.1: QB net for a weighted sum of density matrices with scalar weights. 
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The meta density matrix /i for this net is 

meta meta \ 



where 



If we define p by 



then 



iV'mefa) = ^ a{q,r\a)^/w'a |a,j = {q,r),q,r) . 



P = Y1 ^aPa , 
a 

where 

Pa = Yl o:i(l^r\a)a*{q',r\a)\q){q'\ 

ri/a 

13.2 Orthogonal Projector Weights 

Consider the QB net of Fig. (13. 2), where 



:i3.4) 



:i3.5) 



:i3.6) 



(13.7) 
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Figure 13.2: QB net for a weighted sum of density matrices with orthogonal projector 
weights. 
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The meta density matrix n for this net is 

= l^meta) i^metal , (13.9) 

where 

\ipmeta) = a{q, r\a) ^/w^ |j = {a,a),a,r = a)|j = {q,r),q,r) . (13.10) 

ri 

If we define 

a = ESj ^- tTf.r /i , (13.11) 

then 

a = Y.iwa\a){a\)pa, (13.12) 

a 

where 
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Pa = Y^ a{q,r\a)d'{q ,r\a)\q){q\ . (13.13) 

ri/a 

Note that 

Sa{a,q) = - "^tlg^ [WaPa^Og^iWaPa)] = H {w) + ^WaS{pa) , (13.14) 

a a 

SM = SiJ2^aPa), (13.15) 

a 

Sa{a) = H{w) . (13.16) 

Therefore, 

5*^(0 : q) = SC^Wapa) - ^WaS{pa) ■ (13.17) 
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14 Signal Distinguishability 



In this section, we will define two measures of signal distinguishability, the Holevo 
Information xi^) ci-nd the Accessible Information Xacd^)- Then we will use a QB net 
to prove that Xacc(^) < x(^); ^ result know as Holevo 's Inequality [17]. 

14.1 Holevo Information 

Given a signal ensemble S = {(w^a, Pa)|Va}, let 

P = Y.^aPa- (14.1) 

a 

The Holevo Information is defined by 

X{S) = S{p)-Y.WaS{pa). (14.2) 

a 

Some of the properties of x(^) ^ire: 

(a) If the Pa's are pure states, then x(^) = S{p). 

(b) If the Pa's are all the same, then x(^) = 0. This result can be generalized as 
follows. The convexity of S{-) (see Table 3) implies < x{^)y with equality iff 
the Pa's are all the same. Thus, x(^) measures the indistinguishability of the 
signal states. 

(c) If the Pa's are orthogonal, then 

S{p) = - ^tT[Wapa^Og2{Wapa)] , (14.3) 

a 

because orthogonal pa's "don't mix" with each other so all sums over index a 
collapse into a single outside sum. From Eq.(14.3), it follows that 

S{p) = - E [WaPa{log2 Wa + log^ Pa)] = H{w) + ^ WaS{pa) , (14.4) 
a a 

so 

X{S) = H{w) . (14.5) 

We see that since orthogonal states are completely distinguishable, their quan- 
tum entropy is essentially classical. This result can be generalized as follows. 
According to Table 3, 

X{S) < H{w) , (14.6) 
with equality iff the pa's are orthogonal. 
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(d) If the Pa's commute (i.e., paPb = PbPa for all a,b), then x(^) reduces to a 
classical entropy. Indeed, because of the commutativity, the p^'s can be simul- 
taneously diagonalized in an orthonormal basis {|6)|V6}. In this basis, S{pa) for 
all a and S{p) reduce to classical entropies. To calculate x{^) explicitly, define 
probabilities P{a\b) and -P(a) by 

p^ = Y.Pib\a)\b){b\, (14.7) 

b 

P{a) = Wa . (14.8) 

Then 

S{p) = S(£P{aMh){h\) = S{Y^P{b)\b){b\) = H{b) , (14.9) 

a,b b 

Y^WaSipa) = J2Pia){-Y^Pib\a)\og,Pib\a)} = H{b\a) , (14.10) 



a 

so 



x{£)=H{a:b) . (14.11) 
14.2 Accessible Information 

Suppose Alice sends Bob a signal using the signal ensemble £ = {(wa, pa)|Va}. 
Bob knows which ensemble Alice is using, but he doesn't know oq. To guess oq. Bob 
devises and measures a POM {Fbl^b}. The value b that he measures for b will be 
characterized by: 

P{b\a) = tiiFbPa) . (14.12) 

(This probability distribution specifies a so called quantum channel.) Since Bob knows 
S, he can use 

P{a) = Wa (14.13) 

as the a priori probability for signal pa for all a G Sa. Bob would like to determine 
the posterior probabilities P{a\b) in terms of what he knows {P{b\a) and P{a)). He 
can do this with Bayes' rule: 

Bob will guess best if he uses the magical POM {Fb|V6} that minimizes the a 
spread of the probability distribution P{a\b). This spread is measured by H{a\b). 
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But H{a : b) (called the "transmitted information") equals H{a) — H{a\b) and H{a) 
is Fh independent. So the magical POM also maximizes the transmitted information 
H{a:b). 

For any signal ensemble S = {(tfa, pa)|Va}, we define the Accessible Informa- 
tion by 

Xacc{S) = max H{a : b) , (14.15) 

where P{b\a) and P{a) are defined by Eqs.(14.12) and (14.13). Since mutual entropies 
are always non-negative, Xacd^) > 0. One can show that equality is achieved iff the 
Pa's are all the same. Hence, Xacc(^) is a measure of indistinguishability of the signals 
Pa, just like x(^) is- In fact, these two measures of indistinguishability are related by 
the so called Holevo's Inequality [17]: 

XacciS) < x{S) , (14.16) 

which we will prove in the next section. It makes intuitive sense that Xacci^) is both a 
measure of indistinguishability and a measure of maximum information transmission. 
One expects that making more distinguishable the signals which compose a message 
will increase the information transmitted by the message. 

14.3 Holevo's Inequality 

Next, we will use a QB net to prove Holevo's Inequality. 




Figure 14.1: QB net for proving Holevo's Inequality. 
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Consider the QB net of Fig. (14.1), where 
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The matrix U must implement a general POM {F;,|V6}. Hence, it will be 
assumed to satisfy Eq.(12.15), which we restate: 



Y.\J{qj,b^,xMMm^\bl = Y.^FMf\mqW8V , (14.17) 

ri ri 

for any function </)(g). 

Let JV^ be the QB net which contains all the nodes shown in Fig.(14.1). Let 

Mq be the sub-net which contains only nodes j,a,r,j,q,r. 

The meta density matrix /i" of f/^ was specified in Eq.(13.10). We also showed 
in Section 13 that if p° is defined by 

p° = E^j tTr,r , (14.18) 

then 

p° = 5^(ti;Ja)(a|)p, , (14.19) 

a 

where 
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Pa = Y^ a{q,r\a)d'{q ,r\a)\q){q\ 

ri/a 

Furthermore, we showed that ii £ = {(w;^, pa)|Va}, then 

Spo{a : q) = x{S) ■ 
The meta density matrix yU-^ of Af? is 



meta I ! 



where 



l^La) = E„f/(g/,fe/,x,|g,6,x)«a(g,r|a)yi^ 
|j = {a,a),a,r = a,j = {q,r), q,r,b, x,t= {q,b, x), qf,bf, Xf) 



Define p-^ by 



In other words, we trace fi-^ over all the external nodes except qf,bf,Xf, and 
it over all internal ones except a. Hence, acts on Hq .,b,,Xf,a- 
To prove Holevo's Inequality, we begin by noticing that 



Spf{a : {bf,q Xf)) = Spf{a) + Spf{bj,q Xj) - Spf{a,bj,q Xj) 



Spf{a) = Spo{a) , 



Spf{bf,q^,Xj) = [Spf{bf,qf,Xf)\^^^ = Spo{q) , 



Spf{a,bf,qj,,Xf) = [Spf{a,bf,q^,Xf) 
Combining Eqs. (14.25) yields 



u=i 



Spo (a, q) . 



Spf[a : {bf,q^,Xf)] = Spo{a : q) 
By virtue of strong sub-additivity. 



Spf (a : bf) < Spf [a : g^, Xj)] . 



Below, we will show that 



Hpf{a : bf) = Spf {a : bf) 
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Combining Eqs. (14.21) and (14.26) finally yields Holevo's Inequality: 

H,f{a:bf)<S^o{a:q) = x{S). (14.27) 

This can be understood as a special case of the Fixed Sender Data Processing In- 
equality [12], [18]. It says that when information is transmitted from a, less reaches 
bj than q. 

To show Eq.(14.26c), we use Eq.(14.17) to express p-^ in terms of the POM 
{Fb|V6}. It is then easy to show that 

= J2^r{FbPa)wa\a,b){a,b\ . (14.28) 

rt 

Replacing tT^FhPa) and Wa by P{b\a) and -P(a) (see Eqs. (14. 12) and (14.13)) yields 

tr. = E ^(«. ^) l«' ^) («' ^1 ■ (14.29) 

a,b 

Eq. (14.26c) now follows. 
14.4 Example 




X 



Figure 14.2: The vectors |0i), |02), l^s)- 



The following example (originally from Ref. [19]) is often used to illustrate 
Holevo's Bound. 
Let 



1 




1 

2 



-1 



-1 
V3 



;i4.30) 
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As shown in Fig. (14.2), these 3 vectors specify the corners of an equilateral triangle 
that lies on the real plane. Now consider the signal ensemble E = {(wa? Pa)|Va}, with 



pa = 10a) (0a I 



for a G ^1,3. It is easy to show that 



3 1 

P = H WaPa = - 
a=l ^ 



so 



Define a POM {Fb|V6} by 



1 
1 



X{S) = S{p) = 1 . 



(14.31) 
(14.32) 

(14.33) 
(14.34) 



where 6 G 3 . Then 



P{b\a) 



F, = -(1-10,) (0,1) 



^a\Fb\(pa) 



if a = 6 

1 if a ^ b 



(14.35) 



;i4.36) 



According to Bayes' rule, in this case the posterior probabilities P{a\b) are equal to 
P{b\a). Thus, if Bob measures this POM and obtains the value b, he can safely con- 
clude that Alice did not send signal b, and he can assign equal posterior probabilities 
to the other two signals. One can show that this POM maximizes H{a : b). Therefore, 



XacciS) = H{a : b) = .5850 . 

Holevo's Inequality is satisfied, as expected. 

Another interesting ensemble considered in Refs.[19] and [8] is 



;i4.37) 



1 

3 



Pa = |$a)($a| 



;i4.38) 



[14.39) 



|$a) = |0a) ® |0a) , (14.40) 

where a G Zi^^, and the vectors |0a) are those defined previously in Eq.(14.30). One 
finds x(^) = 1-5 and Xacc(^) = 1-3691. 
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15 EPR Pair 



In this section, we will consider a QB net that represents an EPR pair. An EPR pair 
consists of two spin half particles in a singlet state (i.e., a state of zero total spin). 

Suppose 1+2) and |— 2) are the states of spin up and down in the +Z direction. 
We define iV'sPi?) by 



Let 



= |o) 



z) ® \+z)) 



1 





l-.) = ll) = 

If e = (61,62) G Bool"^, then ipEPR{e) = {e\ipEPR) is 




1 



^EPRie) = - SIT] ■ 



(15.1) 

(15.2) 
(15.3) 

(15.4) 



X 




y 



Figure 15.1: QB net for EPR pair. 
Consider the QB net of Fig. (15.1), where 



nodes 


states 


amplitudes 


comments 


6 


6 = (61, 62) G Bool^ 






X 


X G Bool 


S{x,ei) 




y 


y G Bool 


5(y,e2) 





The meta density matrix /i of this net is 

/i = \i>nieta) {'ipmetal 



where 



(15.5) 
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I^meta) = ^ epr{x , y)\e = {x,y),x,y) 



(15.6) 



Define p by: 



Then 



p = ESe p 



(15.7) 



[{x,y\p\x', y')] = - 




1 

-1 




1-10 





00 


01 


10 


11 


00 














01 





1 


-1 





10 





-1 


1 





11 















(15.9) 



p is a pure state so Sp{x, y) = and Sp{x) = Sp{y). It is easy to show that 



^ y 



(15.10) 
(15.11) 



Thus, 



Sp{x) = 1, Hp{x) = 1 (zero coherence) 
Sp{y) = 1, Hp{y) = 1 (zero coherence) 

Sp{x,y) = 0, Hp{x,y) = 1 (not max. coherence since Hp{x,y) ^ 2) 

Sp{x\y) = -\, Hp{x\y) = ^ 
5p(y|x) = -l, iJ,(y|a;) = 
Sp{x ■.y) = 2, Hp{x ■.y) = l 

(15.12) 

Define p{y) by 

p{y) = EE, {y\p\y) = {y\p\y) . (15.13) 
p{y) acts on Hx- It is easy to show that 

p{y) = \0)m+\l)m. (15.14) 

Thus, 
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Sp{y){x)=0, Hp(y){x) = (zero coherence) . (15.15) 

These results can be interpreted as follows. We start with an EPR pair of 
particles. One particle goes to Alice (x). The other goes to Bob (y). The density 
matrix called p above corresponds to a situation in which Bob ignores his particle. 
The particle is still measured passively by the environment. Alice gets no information 
from the environment, so her particle has a 50/50 chance of being either up or down 
along any direction. The density matrix called p{y) above corresponds to a situation 
in which instead of ignoring his particle, Bob measures it along the +Z direction and 
communicates the result to Alice. The experiment is repeated many times. When 
Bob reports result Alice sticks her particle into bin Bob+, and when he reports 
—z, she sticks it into bin Bob—. Alice's particles in bin Bob+ (ditto, bin Bob—) 
behave as if they were in pure state |— 2) (ditto, 1+2))- (Note that Alice's particle 
points opposite to Bob's. This is expected since the initial state ipEPR of the two 
particles has zero total spin, and this quantity is conserved during the experiment.) 
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16 Quantum Eraser 



In this section, we will consider a QB net that represents a situation in which one 
member of an EPR pair is measured in a special way so as to exhibit a phenomenon 
loosely called "quantum erasing" . 

Suppose and |— „) are the states of spin up and down in the +n direction, 
where n is either X or Z. Let 



= |0) 
|-.) = |1) 

l+x) = |0x) = 

-.) = \lx) = 



1 




1 



V2 
1 

V2 



Define U by 



U 



1 
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1 1 
1 -1 



Note that 



Also note that for y,r ^ Bool, 



f/|+.) = l+x) 



{r\U\y) = ^(-ir 



Consider the QB net of Fig. (16.1), where 



(16.1) 
(16.2) 
(16.3) 
(16.4) 

(16.5) 

(16.6) 
(16.7) 

(16.8) 



nodes 


states 


amplitudes 


comments 


e 


e = (ei, 62) G BooP 






X 


X G Bool 


S{x,ei) 




y 


y G Bool 


Siy,e2) 




L 


r G Bool 


U{r\y) = ^,{-ir 
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Figure 16.1: QB net for a quantum eraser. 

Let Af^ be the QB net which contains all the nodes shown in Fig.(16.1). Let 
Mq be the sub-net which contains only nodes x,e,y. 

The meta density matrix /io of Aq^ was given in Section 15. Let po = EEg /io- 
Thus, Po corresponds to what we called simply p in Section 15. 

The meta density matrix p of J\f^ is 

= l^meta) i^metal , (16.9) 

where 



\ipmeta) = i^lv)'^ epr{x , y)\e = {x,y),x,y,r) 



;i6.io) 



Define p by: 



Then 



P = EEe,y p 



(16.11) 



P = E ^[(-l)^^o - m-^r'^o - 5f ]|x,r)(x',r'| 



;i6.i2) 



[{x,r\p\x',r')] = - 



1 -1 -1 -1 



1 
4 





00 


01 


10 


11 


00 


1 


-1 


-1 


-1 


01 


-1 


1 


1 


1 


10 


-1 


1 


1 


1 


11 


-1 


1 


1 


1 



p is a pure state so Sp{x,r) = and Sp{x) = Sp{r). It is easy to show that 



(16.14) 
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Thus, 



Sp{x) = 1, 
Sp{r) = 1, 



Hp{x) = 1 
Hp{r) = 1 



(zero coherence) 
(zero coherence) 



Sp{x,r) = 0, Hp{x,r) = 2 (max. coherence) 



Sp{x\r) 
Sp{l\x) 



-1, Hp{x\r) = l 
-1, Hp{r\x) = l 



Sp{x : r) = 2, Hp{x : r) = 
Define p(r) by 

p(r) = 2ESe,^ (r|/i|r) = 2{r\p\r) . 
p{r) acts on Hx- It is easy to show that 

p{r) = \Ox){Ox\Sl + \lx){lM- 

Thus, 

Sp{r){x) = 0, Hp(^r){x) = 1 (max. coherence) 



(16.15) 



(16.16) 



(16.17) 



(16.18) 



(16.19) 



y =0 



10 







r =0. 



01 



10 
1 0\ 





y = 1 



01 



1 



r = 1 



01 

\ 

• 

10 1 

/\ 

1 0\ 



Figure 16.2: Comparison of Feynman stories for QB net Fig. (15.1) representing an 
EPR pair and QB net Fig.(16.1) representing a quantum eraser. 
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These results can be interpreted as follows. We start with an EPR pair of 
particles. One particle goes to Alice (x). The other goes to Bob (y, r). Bob passes his 
particle through a Stern-Gerlach magnet that separates it into its ±x parts. The den- 
sity matrix called p above corresponds to a situation in which Bob ignores his particle 
after it leaves the Stern-Gerlach magnet. The particle is still measured passively by 
the environment. Alice gets no information from the environment, so here particle has 
a 50/50 chance of being either up or down along any direction. The density matrix 
called p(r) above corresponds to a situation in which instead of ignoring his particle, 
Bob measures it along the -|-X direction and communicates the result to Alice. The 
experiment is repeated many times. When Bob reports result +x, Alice sticks her 
particle into bin Bob+, and when he reports —x, she sticks it into bin Bob—. Alice's 
particles in bin Bob-|- (ditto, bin Bob—) behave as if they were in pure state \—x) 
(ditto, l+a;))- (Note that Alice's particle points opposite to Bob's. This is expected 
since the initial state ipEPR of the two particles has zero total spin, and this quantity 
is conserved during the experiment.) 

This is all very similar to Section 15. But note that in Section 15, Alice's 
particle ends in state +z (or — ^, depending on the result of Bob's measurement), 
whereas now it ends in state +x (or —x)- As shown in Fig. (16. 2), if the value of y is 
fixed, then there is only one possible Feynman story. On the other hand, if the value 
of r is fixed, there are two possible Feynman stories. A related fact: In Section 15, 
Alice's particle ends in a state characterized by the density matrix which is 

diagonal in the \±z) basis, whereas now it ends in a state characterized by a density 
matrix |+a;)(+x| which isn't diagonal in the |±^) basis. 

We often say that an experiment of this sort is a "quantum eraser" . By this, 
we mean the following. According to Eqs. (15.12) and (16.19) 

Spoisi) = 1, Hp^^{x) = 1 (zero coherence) , (16.20) 

Sp{r){x) = 0, Hp(r_){x) = 1 (max. coherence) . (16.21) 

In Eq.(16.20), Bob ignores his particle. In Eq.(16.21), he passes it through a Stern- 
Gerlach magnet and reports the result of his measurement to Alice. We can go from 
minimum coherence (Eq.(16.20)) to the maximum coherence (Eq.(16.21)) for node x 
simply by asking Bob to do some extra processing. This extra processing seems to 
erase the coherence destroying mechanism. 

Note that the density matrix p defined above acts on Ti.x,r and that 

{x\ {r\p\r) \x) = (r| {x\p\x) \r) . (16.22) 

That is, the order in which we apply red|a;)(a;| and red|r){r| does not matter. This is 
often called the "delayed choice" phenomenon. 

Note that we found Hp{x : r) = in this section, whereas we found Hp^{x_ : 
y) = I'm Section 15. That is, x and r are independent whereas x and y aren't. That's 
because x_ and y must have opposite values whereas x and r don't have to. 
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17 Teleportation 



In this section, we will consider a QB net that represents the phenomenon known as 
Teleportation [2 (j] . 



e 




Figure 17.1: QB net for Teleportation. 



Consider the QB net of Fig. (17.1), where 



nodes 


states 


amplitudes 


comments 


e 


e = (ei, 62) e Bool^ 








X G Bool 


S{x,ei) 




y 


y G Bool 


^(l/,e2) 




a 


a G Bool 


"a 


Eal«a|'= 1 


L 


f = ifi, f2) e BooP 


U{f\a,x) 


U specified below 


b 


b G Bool 




R specified below 



Consider the so called "Bell basis" vectors |\E'(/)): 



l*(/)) = ^[|0,/i) + (-1/^11, /i)], (17.1) 

where / G BooP, and = 1, 1 = 0. fi tells us whether the two particles are in the 
same or different states (different state iff /i = 1). /2 tells us the sign between the 
two kets being summed (minus sign iff /2 = 1). For example, 

1^(1,1)) = i=(|0,l)-|l,0)). (17.2) 

The state i^EPRie) given above equals (61^^(1, 1)). 

We define the matrix U mentioned above by 
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(17.3) 



[Uif\a,x)] 



V2 





00 


01 


10 


11 


00 


1 








1 


01 


1 








-1 


10 





1 


1 





11 





1 


-1 






(17.4) 



The columns of U are clearly orthonormal so [/ is a unitary matrix. 

The matrix R mentioned above can be defined in terms of U by 

Rib\f,y) = Uif\b,y)i-l)yi-iy^f^V2. 



(17.5) 



Our reasons for defining R in this way will become clear as we go on. Note that 



E|i?(&l/,y)r = i, 

b 

as required by the definition of QB nets. 

It is convenient to define a function K{-) by 



(17.6) 



K{x,y,aJ,b) = R{b\f,y)U{f\a,x)^ljEPR{x,y) . (17.7) 
Substituting explicit expressions for R, U and ipEPR into the last equation yields 



K(x,y,a, f, b) 



(-l)hh 



<ra,x / ra.x I ra.x \ 



From this expression for K{-), it follows that 

(-l)fif. 

^ 2 

x,y 

1 



51 Y.K = 5\ 



b ) 



x,yj 



x,y,f 

Define the following kets: 

\i>in) = XI = ' 

a 

a 

l-ipout) = Y.^i^-)\(^-)z,^t) = f,b)aa\b) 

X. all 

li^outif)) = 2 51 K{x, y, a, /, b)aa\b) . 

all/f 



(17.8) 

(17.9a) 
(17.9b) 

(17.10a) 
(17.10b) 
(17.10c) 
(17.10d) 
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Note that we don't sum over / in the equation for \ipout{f))- It follows by Eqs.(17.9) 
that the kets of Eqs.(17.10) have unit magnitude and that 



i^outif)) = i-iy^^^ii^L) , (17.11) 

li^out) = lO • (17.12) 

Because of Eq.(17.11), one says that the QB net of Fig. (17.1) "teleports" a quantum 
state from node a to node b. Without knowing the state \ipin), Alice at / measures 
the joint state delivered to her by a and x. She obtains result / which she sends by 
classical means to Bob at b. Bob can choose to allow any value of /, or he can ignore 
those repetitions of the experiment in which / does not equal a particular value, say 
(0, 1). In either case, the state \ipout{f)) emerging from Bob's lab b is equal to ±\ipin)- 
Note that according to Eq.(17.12), even if Alice does not measure /, and instead she 
sends a quantum message to Bob, \ipout) equals \ipin)- However, this is not "true" 
teleportation. In "true" teleportation, we allow Alice to receive quantum messages 
but not to send them. 

The meta density matrix /i for the net of Fig.(17.1) is 

At = li^meta) {i^metal , (17.13) 

where 

l^meta) = ^(^' 2/' /' ^)"a|e = (x, y), X, c, /, 6) . (17.14) 

all 

Note that by Eqs.(17.9), lipmeta) has unit magnitude. 
Define the reduced matrix a by 

a = ESe,.,^ (/i) . (17.15) 

It is easy to show that 

a=\<Pa,b){<Pa,b\ 10/) (</>/!, (17.16) 

where 

I'^a.fc) = H"'^!^ = "'^ = ' (17.17) 

a 

I0/) = E^^I/)- (17-18) 



/ 



Define 



i7- = - ^ (17.19) 

a&Bool 
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Next we will calculate classical and quantum entropies for various possible density 
matrices p: 



(a)p = tr^ a 
Then 

P={Y.K\'\a){a\y<Pf){<f>f\. (17.20) 
It is easy to show from Eq. (17.20) that 



Sp{a) = -ff*"", Hp{a) = (zero coherence) 

Sp{f) = 0, Hp{f) = 2 (max. coherence) 



'17.211 



SML) = ^p(«l/) = 

Spilla) = 0, Hpilla) = 2 
Sp{a : /) = 0, Hp{a : /) = 



{h)p = Af{f\a\f) 
Then 



P=\<Pa,b){<Pa,b\ . (17.22) 



Note that we get the same density matrix if we reduce a by projecting, tracing or 
e-summing over node /: 

J\f{f\a\f)=tT^a = EJ:^a. (17.23) 
It is easy to show from Eq. (17.22) that 



Sp{a) = H^^, Hp{a) = if*" (zero coherence) 

Splk) = if*", Hp{b) = ii*" (zero coherence) 

5'p(a,6) = 0, Hp\a,h) = H''^ 

Sp{a\b) = -ii*", Hp{a\b) = 

Sp{b\a) = -ii*", Hplb\a) = 

Spla : b) = 2if*", Hpla : b) = W 



(17.24) 



transmitted info: quantum — 2 classical 
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18 Qubit Bouncing (a.k.a. Dense Coding) 



Ref. [21] was the first to discuss a phenomenon that we will call qubit bouncing. 
Qubit bouncing is often called "quantum super dense coding". In this section, we 
will consider a QB net that represents qubit bouncing. 




Figure 18.1: QB net for Qubit Bouncing 
Consider the QB net of Fig. (18.1), where 



nodes 


states 


amplitudes 


comments 


e 


e = (ei, 62) G BooP 






X 


X G Bool 


5(x,ei) 




y 


y G Bool 


^(l/,e2) 




a 


a = (ai, 02) G BooP 


"a 


Eal«aP = l 


t 


t G Bool 


R{t\a, x) 


R specified below 


b 


b = ibiM) e BooP 


Uib\t,y) 


U specified below 



The matrix U in this section is identical to its namesake in the Teleportation 



section: 



1 

V2' 



f/(6|t,y) = ^(5*l + (-if^5;j). 



:i8.ii 



The matrix R can be defined in terms of U by 

R{t\a, x) = U{a\t, S)(-l)^ ^2 . (18.2) 
Our reasons for defining R in this way will become clear as we go on. Note that 



j:m\a,x)\' 



[18.3) 
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as required by the definition of QB nets. 

It is convenient to define a function K{-) by 

K{x, y, a, t,b) = U {b\t, y)R{t\a, x)i)EPR{x, y) . (18.4) 
Substituting explicit expressions for _R, U and ipEPR into the last equation yields 

K{x, y. a, t,b) = ^- + (-l)-^-''^^^] • (18-5) 
From this expression for K{-), it follows that 

Y.K = ls^M + i-ir^'^Sll Y.K = 5t, (18.6a) 

x,y x,y,t 

Define the following kets: 

\ipin) ='^aa\a = a) , (18.7a) 

a 

\ij'J = T.<^a\b = a) , (18.7b) 

a 

lil^out) = J2^i^-)\(^-)ze.t) = ^K{x,y,a,t,b)aa\b) , (18.7c) 

X. all 

It follows by Eqs.(18.6) that the kets of Eqs.(18.7) have unit magnitude and that 

\i^out) = lO • (18.8) 
The meta density matrix /i for the net of Fig.(18.1) is 

A* = \lpmeta) {ipmetal , (18.9) 

where 

Itpmeta) = J2 ^(^' ^' ^' ^' = (^> V) ^ ^J, y, O, t, b) . (18.10) 
all 

Note that by Eqs.(18.6), \ilJmeta) has unit magnitude. 
Define the reduced matrix a by 

a = ESe,.,, (/i) . (18.11) 

It is easy to show that 

(r=\<Pa,tJ{<Pa,tJ. (18.12) 

where 
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W = E + {-ir''''S{]a.\a,t,b) 

all 



Define 



■ J2 l^a^logada 

a^BooP 



a\ J 1 



W, 



E 



ai / , I "aia2 I 1 
a2 



(18.13) 

(18.14) 
(18.15) 
(18.16) 



HT = - E W„,l0g2(Wai) • 
ai aBool 

Next we will calculate classical and quantum entropies for various possible 
density matrices p: 

(a) p = trfc cr 



Then 



P = E (^'ai,tkl,0(ai,t|)Pai,t , 
11, t 



where 



Paut = \(pa2{ai,t)){(f)a^{ai,t)\ 



where 



/w, 



11 12 



It is easy to show from Eqs.(18.17) that 



S,ia) = W^, 
Spit) = 1, 
Sp{a,t) = l + i/f 



Hp{a) = H'"" 
Hpit) = 1 
Hp{a,t) = l + W 

Hp{a\t) = if*" 



(zero coherence) 
(zero coherence) 



Spia\t) = H\\ 
Sp{t\a) = 1 + ifr - H''\ Hp{t\a) = 1 
Sp{a : t) = if*" - iif , Hp{a : t) = 



(18.17a) 

(18.17b) 
(18.17c) 

(18.17d) 



(18.18) 



(b) p = ESt a 
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Then 



P= |0a,fe)(0a,6| (18.19a) 

where 

\<Pa,b) =^Ola\QL= 0-^= d) ■ (18.19b) 

a 

It is easy to show from Eqs.(18.19) that 



Sp{(k) = H^^, Hp{a) = i/*" (zero coherence) 
Sp{b) = if*", Hp{b) = if™ (zero coherence) 

Sp{a,b) = 0, Hp{a,b) = H' 



rtn 



Sp{a\b) = Hp{a\b) = 

Sp{b\a) = -ii™, Hp{b\a) = 

Sp{a : b) = 2ii'", Hp{a : b) = ii"^" transmitted info: quantum — 2 classical 



(18.20) 
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A Review of Classical and Quantum 
Bayesian Nets 

In this Appendix, we give a brief review of Classical Bayesian (CB) and Quantum 
Bayesian (QB) nets. For more information, see Ref. [10]. 

First, we will state those properties which CB and QB nets have in common. 

We call a graph (or a diagram ) a collection of nodes with arrows connecting 
some pairs of these nodes. The arrows of the graph must satisfy certain constraints 
that will be specified below. We call a labelled graph a graph whose nodes are labelled. 
A CB net (ditto, a QB net) consists of two parts: a labelled graph with each node 
labelled by a random variable, and a collection of node matrices, one matrix for each 
node. These two parts must satisfy certain constraints that will be specified below. 

An internal arrow is an arrow that has a starting (source) node and a different 
ending (destination) one. We will use only internal arrows. We define two types of 
nodes: an internal node is a node that has one or more internal arrows leaving it, and 
an external node is a node that has no internal arrows leaving it. It is also common 
to use the terms root node or prior probability node for a node which has no incoming 
arrows (if any arrows touch it, they are outgoing ones). 

We restrict our attention to acyclic graphs; that is, graphs that do not contain 
cycles. (A cycle is a closed path of arrows with the arrows all pointing in the same 
sense.) 

We assign a random variable to each node of a CB net. Suppose the random 
variables assigned to the nodes For each j e Zi^^, the random 

variable Xj will be assumed to take on values within a finite set S^. called the set of 
possible states of Xy 

If r = {/ci, ^2, ■ ■ ■ , ^|r|} C Zi^N, and ki < k2 < ■ ■ ■ < k\Y\, define (x.)r = 
(xfc^, Xfcj, ■ ■ ■ , Xfc|p|) and (x.)r = {xj^^.x^^, ■ ■ ■ ,x^^^). Sometimes, we also abbreviate 
(x.)^j AT vector that includes all the possible Xj components) by just x., and 

(^•)^i AT tiy just X. . We often refer to X = (x.)r as a node collection. We say X_ is 
empty if |r| = 0. If |r| = 1, we say it is a single-node node collection, and if |r| > 1, 
we say it is a compound node collection. Given two node collections X^i = {x.)ri and 
2L2 = (^•)r2) we say that 2Li and 2L2 are disjoint (ditto, 2Li is a subset of X2), if Fi 
and F2 are disjoint (ditto, Fi C F2). 

Let Zea;f be the set of all j G Zijy such that Xj is an external node, and let 
Zint be the set of all j G tv such that Xj is an internal node. Clearly, Z^xt and Zint 
are disjoint and their union is Zi^n- 

Each possible value x. of x. defines a different net story. For any net story x., 
we call {x.)zi„t the internal state of the story and {x.)z^^t its external state. 

Define Tj to be the set of all k such that an arrow labelled Xk (i.e., an arrow 
whose source node is Xk) enters node Xj. 

Next, we will state those properties which are different in CB and QB nets. 
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(a) Classical Bayesian Net 



For each net story x. of a CB net, we assign a non-negative number Pj [xj \ {x.)y] 
to eacli node Xj. We call Pj[xj\{x.)Yj] the probability of node Xj within net story x.. 
The function Pj with values Pj[xj|(x.)r ] determines a matrix that we call the node 
matrix of node the matrix's row index and {x.)y is its column index. We 

require that the values Pj[xj\{x.)Tj] be conditional probabilities; i.e., that they satisfy: 

P,[x,|(x.)rj>0, (A.l) 

^P,[x,|(x.)r,] = l, (A.2) 

where the sum in Eq.(A.2) is over all the states that the random variable Xj can 
assume, and where Eqs.(A.l) and (A.2) must be satisfied for all j G Zi^n and for 
all possible values of the vector (x.)rj. of random variables. The left-hand side of 
Eq.(A.2) is just the sum over the entries of a column of the node matrix. 

The probability of net story x., call it P{x.), is defined to be the product of all 
the node probabilities Pj[xj|(x.)rJ for j G Zi^j^. Thus, 

P{x.)= n PA^j\i^-h]. (A.3) 



We require P{x.) to satisfy: 



J2P{x.) = 1. (A.4) 



Call a CB pre-net a labelled graph and an accompanying set of node matrices 
that satisfy Eqs.(A.l), (A.2) and (A.3), but don't necessarily satisfy the overall nor- 
malization condition Eq.(A.4). It can be shown that all acyclic CB pre-nets satisfy 
Eq.(A.4). If one considers only acyclic graphs as we do in this paper, then there is 
no difference between CB nets and CB pre-nets. 

(b) Quantum Bayesian Net 

For each net story x. of a QB net, we may assign a a complex number 
/lj[xj|(x.)r ] to each node Xj. We call /lj[xj|(x.)r ] the amplitude of node Xj within 
net story x.. The function Aj with values /lj[xj|(x.)r ] determines a matrix that we 
call the node matrix of node the matrix's row index and (x.)r is its column 

index. We require that the quantities y4j[xj|(x.)r ] be probability amplitudes that 
satisfy: 

EU,[x,|(x.)r,]' = l, (A.5) 
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where the sum in Eq.(A.5) is over all the states that the random variable X j can 
assume, and where Eq. (A. 5) must be satisfied for all j G Zi^^ and for all possible 
values of the vector (x.)rj of random variables. 

The amplitude of net story x., call it A{x.), is defined to be the product of all 
the node amplitudes 74j[xj|(x.)rJ for j G Zi^^. Thus, 



A{x. 



i6^i,jv 



(A.6) 



We require A{x.) to satisfy: 



E 



X. 



and 



EI^(^-)P 



(A.7) 



(A.8) 



Note that as a consequence of Eqs.(A.5) and (A.8), given any QB net, one 
can construct a special CB net by replacing at each node the value 74[xj|(x.)r ] by 
its magnitude squared. We call this special CB net the parent CB net of the QB net 
from which it was constructed. We call it so because, given a parent CB net, one can 
replace the value of each node by its square root times a phase factor. For a different 
choice of phase factors, one generates a different QB net. Thus, a parent CB net may 
be used to generate a whole family of QB nets. 

A QB pre-net is a labelled graph and an accompanying set of node matrices 
that satisfy Eqs.(A.5), (A.6) and (A.7), but don't necessarily satisfy Eq.(A.8). A QB 
pre-net that is acyclic satisfies Eq.(A.8), because its parent CB pre-net is acyclic and 
this implies that Eq.(A.8) is satisfied. If one considers only acyclic graphs as we do 
in this paper, then there is no difference between QB nets and QB pre-nets. One 
can check that all the examples of QB nets considered in this paper satisfy Eq.(A.8). 
Eq.(A.8) is true iff the meta state \ipmeta) defined by Eq.(6.2) has unit magnitude. 

References 

[1] Masud Mansuripur, Introduction to Information Theory (Prentice-Hall, 1987). 

[2] T. M. Cover, J. A. Thomas, Elements of Information Theory (Wiley, 1991). 

[3] C.W. Helstrom, Quantum Detection and Estimation (Academic, New York, 
1976). 



[4] A. Wehrl, "General Properties of Entropy", Rev. Mod. Phys. 50 221-260 (1978). 



71 



[5] Asher Peres, Quantum Theory : Concepts and Methods (Kluwer, 1993). Chapter 
9, entitled "Information and Thermodynamics", is especially relevant to this 
paper. 

[6] CM. Caves, P.D. Drummond, "Quantum Limits of Bosonic Communication 
Rates", Rev. Mod. Phys. 66 481-537 (1994). 

[7] C. H. Bennett and P. Shor, "Quantum Information Theory", 
IEEE Trans. Info. Theory 44, 2724 (1998). Also available at 
http:/ /www. research. att.com/ shor/papers/index.html 

[8] J. Preskill, Lecture notes for Caltech course Physics 229 Available at: 
http: / / www.theory.caltech.edu/people / preskill/ph229 / 

[9] B. Schumacher, Lectures given at University of Innsbruck, from 28 May to 12 Jun 
1998. Available at: http://www2.kenyon.edu/people/schumacb/lectures.htm 

[10] R. R. Tucci, Int. Jour, of Mod. Physics B9, 295 (1995). Available as Los Alamos 
eprint quant-ph/9706039. The theory of this paper is implemented by a computer 
program called "Quantum Fog", available at www.ar-tiste.com . 

[1 1] This analogy between Information Theory and Set Theory, and its pictorial rep- 
resentation in terms of Venn diagrams, has been known since time immemorial. 
I'm not sure who was the first to point it out, but it seems to have been common 
knowledge less than five years after Shannon's 1948 paper that started it all. I 
suspect that the analogy can be phrased more generally and rigorously within the 
mathematical field of Lattice Algebras, but I know of no references to support 
this claim. 

[12] R.R. Tucci, "Data Processing Inequalities for Bayesian Nets", Los Alamos eprint 
quant-ph/? 

[13] This is very much in the spirit of N. J. Cerf, C. Adami, "Negative entropy and 
information in quantum mechanics", Phys.Rev.Lett. 79 (1997) 5194 (available 
as Los Alamos eprint quant-ph/951202. Note other Los Alamos eprints by same 
authors on similar topics.) Like us, Cerf and Adami advocate defining quantum 
conditional and mutual entropies so as to preserve the Venn diagrams which 
have been used in classical information theory for decades. However, there are 
some big differences between our work and theirs (apart from the obvious fact 
that they don't use Bayesian nets). For them the A and B in S{A\B) refer 
to separate "sub-systems" at the same instant of time. For us they are node 
random variables which need not represent separate subsystems. They might, 
for example, represent the same sub-system at different instants. 

[14] B. Schumacher, M. A. Nielsen, "Quantum data processing and error correction", 
Los Alamos eprint quant-ph/9604022. 



72 



[15] B. Schumacher, "Sending quantum entanglement through noisy channels", Los 
Alamos eprint quant-ph/9604023. 

[16] It is also called a POVM, which stands for ((Positive Operator) Valued) Measure. 
The reason for the long name is as follows. 

In classical probability, one speaks of an event space Q and a function fi : Q ^ 
(Non — negativeReals) called a real-valued measure. A random variable 6 on is 
a function b : Q Sb, where Sb is the set of values that b may assume. P{b = b) 
is defined by 

p(b = b) = fx{{u en \ b{uj) = b}) . 

In quantum mechanics, one speaks of an event space Q, a Hilbert space 
H, a density matrix p acting on H, and a function '■ ^ 

(Non — negativeOperatorsActingOn Ti.) called an operator- valued measure. A 
random variable b is still a function b : Q ^ Sb- For each b G Sb, one defines an 
operator Fb acting on Ti. by 

Fb = Hop {{uj eQ\ b{uj) = b}) . 

Then P{b = b) is defined by 

Pib = b)=tT{Fbp). 

It's really i^op that is a POM, but since the set {Fb\Wb} partly specifies pop, we call 
this set a POM too. For more information about POMs, see [6] and references 
therein. 

[17] A. S. Holevo, "Information Theoretical Aspects of Quantum Measurement", 
(Engl. Transl.) Problems of Information Transmission, 9, 177-183 (1973). 

[18] Andreas Winter, quant-ph/9907077; R. Ahlswede, P. Loeber, quant-ph/9907081. 
These workers from the Uni. of Bielefeld have also shown (working independently 
from me, and using a C* Algebra approach) that Holevo's Inequality follows from 
a Data Processing Inequality. 

[19] A. Peres, W.K. Wootters, "Optimal Detection of Quantum Information", Phys. 
Rev. Lett. 66 1119-1122 (1991). 

[20] C.H. Bennett, G. Brassard, C. Crepeau, R. Jozsa, A. Peres, W. Wootters, Phys. 
Rev. Lett., 70, 1895 (1993). 

[21] C.H. Bennett, S.J. Wiesner, Phys. Rev. Lett., 69, 2881 (1992). 



73 



